(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 

International Bureau 




iniiiioiiiii«DiiiiiiiDiiiiiii9giii 



(43) International Publication Date (10) International Publication Number 

3 January 2003 (03.01.2003) PCT WO 03/000941 A2 



(51) International Patent Classification 7 : 



C21N 



(21) International Application Number: PCT/DK02/0O429 

(22) International Filing Date: 26 June 2002 (26.06.2002) 



(25) Filing Language: 

(26) Publication Language: 

(30) Priority Data: 
PA 2001 01000 



English 
English 



26 June 2001 (26.06.2001) DK 



(71) Applicant (for all designated States except US): 
NOVOZYMES A/S [DK/DK]; Krogshojvej 36, DK-2880 
Bagsvierd (DK). 

(72) Inventors; and 

(75) Inventors/Applicants (for US only): LANGE, Lene 
[DK/DK]; Karensgade 5, DK-2500 Valby (DK). WU, 
Wenping [CN/CN]; Room 103, Er Dan Yuan in Building 
#3, Yi Qu, Dong Li Xiao Qu, Shang Di Zone, Haidan 
District, 100085 Beijing (CN). AUBERT, Dominique 
[FR/DK]; L®ss0esgade 18B 2.lh, DK-2200 Copenhagen 
N (DK). LANDVIK, Sara [DK/DK]; Stockholmsgade 
13, st. rv., DK-2100 Copenhagen 0 (DK). SCHNORR, 
Kirk, Matthew [US/DK]; Sollerfldgardsvej 38, DK-2840 
Holtc (DK). CLAUSEN, lb, Groth [DK/DK]; Fyrrcsticn 
6, DK-3400 Birker0d (DK). 



(74) Common Representative: NOVOZYMES A/S; Att: 
Patents, Krogshojvej 36, DK-2880 Bagsvaerd (DK). 

(81) Designated States (national): AE AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CI J, 
CZ, DE, DK, DM, DZ, EC, EE, ES, H, GB, GO, GE, GH, 
GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, 
LK, LR, LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, 
MX, MZ, NO, NZ, OM, PII, PL, PT, RO, RU, SD, SE, SG, 
SI, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, 
VN, YU, ZA, ZM, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, T.S, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), 
European patent (AT, BE, CH, CY, DE, DK, ES, FT, FR, 
GB, GR, IE, IT, LU, MC, NL, PT, SE, TR), OAPI patent 
(BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW, ML, MR, 
NE, SN, TD, TG). 

Published: 

— without international search report and to be republished 
upon receipt of that report 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



§ BEST AVAILABLE COPY 



m (54) Title: POLYPEPTIDES HAVING CELLOBIOHYDROLASE I ACTIVITY AND POLYNUCLEOTIDES ENCODING SAME 

^ (57) Abstract: Abstract The present invention relates to polypeptides having cellobiohydrolase I activity and polynucleotides having 
^ a nucleotide sequence which encodes for the polypeptides. The invention also relates to nucleic acid constructs, vectors, and host 
^ cells comprising the nucleic acid constructs as well as methods for producing and using the polypeptides. 



WO 03/000941 PCT/DK02/00429 
POLYPEPTIDES HAVING CELLOBIOHYDROLASE I ACTIVITY 
AND POLYNUCLEOTIDES ENCODING SAME 



Field of the Invention 

5 The present invention relates to polypeptides having cellobiohydrolase I (also referred to 

as CBH I or CBH 1) activity and polynucleotides having a nucleotide sequence which encodes 
for the polypeptides. The invention also relates to nucleic acid constructs, vectors, and host 
cells comprising the nucleic acid constructs as well as methods for producing and using the 
polypeptides. 

10 

Background of the Invention 

Cellulose is an important industrial raw material and a source of renewable energy. The 
physical structure and morphology of native cellulose are complex and the fine details of its 
structure have been difficult to determine experimentally. However, the chemical composition 

15 of cellulose is simple, consisting of D-glucose residues linked by beta-1,4-glycosidic bonds to 
form linear polymers with chains length of over 10.000 glycosidic residues. 

In order to be efficient, the digestion of cellulose requires several types of enzymes 
acting cooperatively. At least three categories of enzymes are necessary to convert cellulose 
into glucose: endo (1,4)-beta-D-glucanases (EC 3.2.1.4) that cut the cellulose chains at 

20 random; cellobiohydrolases (EC 3.2.1.91) which cleave cellobiosyl units from the cellulose 
chain ends and beta-glucosidases (EC 3.2.1.21) that convert cellobiose and soluble 
cellodextrins into glucose. Among these three categories of enzymes involved in the 
biodegradation of cellulose, cellobiohydrolases are the key enzymes for the degradation of 
native crystalline cellulose. 

25 Exo-cellobiohydrolases (Cellobiohydrolase I, or CBH I) refer to the cellobiohydrolases 

which degrade cellulose by hydrolyzing the cellobiose from the non-reducing end of the 
cellulose polymer chains. 

It is an object of the present invention to provide improved polypeptides having 
cellobiohydrolase I activity and polynucleotides encoding the polypeptides. The improved 

30 polypeptides may have improved specific activity and/or improved stability - in particular 
improved thermostability. The polypeptides may also have an improved ability to resist 
inhibition by cellobiose. 

Summary of the Invention 

35 In a first aspect the present invention relates to a polypeptide having cellobiohydrolase I 

activity, selected from the group consisting of: 
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(a) a polypeptide comprising an amino acid sequence selected from the group consisting of: 
an amino acid sequence which has at least 80% identity with amino acids 1 to 526 of 
SEQ ID NO:2, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 529 of 
5 SEQ ID NO:4, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 451 of 
SEQ ID NO:6, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 457 of 
SEQ ID NO:8, 

10 an amino acid sequence which has at least 80% identity with amino acids 1 to 538 of 

SEQ ID NO:10, 

an amino acid sequence which has at least 70% identity with amino acids 1 to 415 of 
SEQ ID NO:12, 

an amino acid sequence which has at least 70% identity with amino acids 1 to 447 of 
15 SEQIDNO:14, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 452 of 
SEQ ID NO: 16, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 454 of 
SEQ ID NO:38, 

20 an amino acid sequence which has at least 80% identity with amino acids 1 to 458 of 

SEQ ID NO:40, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 450 of 
SEQ ID NO:42, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 446 of 
25 SEQ ID NO:44, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 527 of 
SEQ ID NO:46, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 455 of 
SEQ ID NO:48, 

30 an amino acid sequence which has at least 80% identity with amino acids 1 to 464 of 

SEQ ID NO:50, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 460 of 
SEQ ID NO:52. 

an amino acid sequence which has at least 80% identity with amino adds 1 to 450 of 
35 SEQ ID NO:54, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 532 of 
SEQ ID NO:56, 
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an amino acid sequence which has at least 80% identity with amino acids 1 to 460 of 
SEQ ID NO:58. 

an amino acid sequence which has at least 80% identity with amino acids 1 to 525 of 
SEQ ID NO:60, and 

an amino acid sequence which has at least 80% identity with amino acids 1 to 456 of 
SEQ ID NO:66; 

(b) a polypeptide comprising an amino acid sequence selected from the group consisting of: 
an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Acremonium 
thermophilum, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Chaetomium 
thermophilum, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Scytalidium 
sp-, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Scytalidium 
thermophilum, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Thermoascus aurantiacus, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Thielavia 
australiensis, 

an amino acid sequence which has at least 70% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Verticillium 
tenerum, 

an amino acid sequence which has at least 70% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Neotermes 
castaneus, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Melanocarpus albomyces t 

an amino acid sequence which has at least 80% identity with the polypeptide encoded 
by the cellobiohydrolase I encoding part of the nucleotide sequence present in 
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Acremonium sp., 

an amino add sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Chaetomidium pingtungium, 
5 an amino acid sequence which has at least 80% identity with the polypeptide encoded by 

the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Sporotrichum pruinosum, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Diplodia 
10 gossypina, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Thchophaea 
saccata, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
IS the cellobiohydrolase I encoding part of the nucleotide sequence present in 

Myceliophthora thermophila, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Exidia 
glandulosa, 

20 an amino acid sequence which has at least 80% identity with the polypeptide encoded by 

the cellobiohydrolase I encoding part of the nucleotide sequence present in Xylaria 
hypoxylon, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Poitrasia 
25 circinans, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Coprinus 
cinereus, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
30 the cellobiohydrolase I encoding part of the nucleotide sequence present in 

Pseudoplectania nigrella, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Trichothecium roseum IFO 5372, 
an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
35 nucleotide sequence present in Humicola nigrescens CBS 81 9.73, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Cladorrhinum foecundissimum CBS 427.97, 
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an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Diplodia gossypina CBS 247.96, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Myceliophthora thermophila CBS 117.65, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Rhizomucor pusillus CBS 109471, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Meripilus giganteus CBS 521.95, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Exidia glandulosa CBS 2377.96, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Xylaria hypoxylon CBS 284.96, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Trichophaea saccata CBS 804.70, 
an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Chaetomium sp., 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Myceliophthora hinnulea, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Thielavia ct microspore, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Aspergillus sp., 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Scopulariopsis sp., 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Fusarium sp., 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Verticillium sp., and 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Phytophthora infestans\ 

a polypeptide comprising an amino acid sequence selected from the group consisting of: 
an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1578 of SEQ ID NO:1, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1587 of SEQ ID NO:3, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
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nucleotides 1 to 1353 of SEQ ID NO:5, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1371 of SEQ ID NO:7, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
5 nucleotides 1 to 1614 of SEQ ID NO:9, 

an amino acid sequence which has at least 70% identity with the polypeptide encoded by 
nucleotides 1 to 1 245 of SEQ ID NO:1 1 , 

an amino acid sequence which has at least 70% identity with the polypeptide encoded by 
nucleotides 1 to 1341 of SEQ ID NO: 13, 
10 an amino acid sequence which has at least 80% identity with the polypeptide encoded by 

nucleotides 1 to 1356 of SEQ ID NO:15, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1365 of SEQ ID NO:37, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
15 nucleotides 1 to 1 377 of SEQ ID NO:39, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1353 of SEQ ID NO:41, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1341 of SEQ ID NO:43, 
20 an amino acid sequence which has at least 80% identity with the polypeptide encoded by 

nucleotides 1 to 1584 of SEQ ID NO:45, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1368 of SEQ ID NO:47, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
25 nucleotides 1 to 1 395 of SEQ ID NO:49, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1383 of SEQ ID NO:51. 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1353 of SEQ ID NO:53. 
30 an amino acid sequence which has at least 80% identity with the polypeptide encoded by 

nucleotides 1 to 1599 of SEQ ID NO:55, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1383 of SEQ ID NO:57, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
35 nucleotides 1 to 1578 of SEQ ID NO:59. and 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1371 of SEQ ID NO:65; 
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(d) a polypeptide which is encoded by a nucleotide sequence which hybridizes under high 
stringency conditions with a polynucleotide probe selected from the group consisting of: 

(i) the complementary strand of the nucleotides selected from the group consisting of: 
nucleotides 1 to 1578 of SEQ ID NO:1. 

nucleotides 1 to 1587 of SEQ ID NO:3, 
nucleotides 1 to 1353 of SEQ ID NO:5. 
nucleotides 1 to 1371 of SEQ ID NO:7, 
nucleotides 1 to 1614 of SEQ ID NO:9, 
nucleotides 1 to 1245 of SEQ ID NO:11, 
nucleotides 1 to 1341 of SEQ ID NO:13, 
nucleotides 1 to 1356 of SEQ ID NO:15, 
nucleotides 1 to 1365 of SEQ ID NO:37, 
nucleotides 1 to 1377 of SEQ ID NO:39, 
nucleotides 1 to 1353 of SEQ ID NO:41, 
nucleotides 1 to 1341 of SEQ ID NO:43, 
nucleotides 1 to 1584 of SEQ ID NO:45, 
nucleotides 1 to 1368 of SEQ ID NO:47, 
nucleotides 1 to 1395 of SEQ ID NO:49, 
nucleotides 1 to 1383 of SEQ ID NO:51, 
nucleotides 1 to 1353 of SEQ ID NO:53, 
nucleotides 1 to 1599 of SEQ ID NO:55. 
nucleotides 1 to 1383 of SEQ ID NO:57, 
nucleotides 1 to 1578 of SEQ ID NO:59, and 
nucleotides 1 to 1371 of SEQ ID NO:65; 

(ii) the complementary strand of the nucleotides selected from the group consisting of: 
nucleotides 1 to 500 of SEQ ID NO:1, 

nucleotides 1 to 500 of SEQ ID NO:3, 
nucleotides 1 to 500 of SEQ ID NO:5, 
nucleotides 1 to 500 of SEQ ID NO:7, 
nucleotides 1 to 500 of SEQ ID NO:9, 
nucleotides 1 to 500 of SEQ ID NO:1 1, 
nucleotides 1 to 500 of SEQ ID NO: 13, 
nucleotides 1 to 500 of SEQ ID NO: 15. 
nucleotides 1 to 500 of SEQ ID NO:37, 
nucleotides 1 to 500 of SEQ ID NO:39, 
nucleotides 1 to 500 of SEQ ID NO:41. 
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nucleotides 1 to 500 of SEQ ID NO:43, 
nucleotides 1 to 500 of SEQ ID NO:45, 
nucleotides 1 to 500 of SEQ ID NO:47, 
nucleotides 1 to 500 of SEQ ID NO:49, 
nucleotides 1 to 500 of SEQ ID NO:51, 
nucleotides 1 to 500 of SEQ ID NO:53, 
nucleotides 1 to 500 of SEQ ID NO:55, 
nucleotides 1 to 500 of SEQ ID NO:57, 
nucleotides 1 to 500 of SEQ ID NO:59, 
nucleotides 1 to 500 of SEQ ID NO:65, 
nucleotides 1 to 221 of SEQ ID NO:17, 
nucleotides 1 to 239 of SEQ ID NO: 18, 
nucleotides 1 to 199 of SEQ ID NO:19, 
nucleotides 1 to 191 of SEQ ID NO:20, 
nucleotides 1 to 232 of SEQ ID NO:21, 
nucleotides 1 to 467 of SEQ ID NO:22, 
nucleotides 1 to 534 of SEQ ID NO:23, 
nucleotides 1 to 563 of SEQ ID NO:24, 
nucleotides 1 to 218 of SEQ ID NO:25, 
nucleotides 1 to 492 of SEQ ID NO:26, 
nucleotides 1 to 481 of SEQ ID NO:27, 
nucleotides 1 to 463 of SEQ ID NO:28, 
nucleotides 1 to 513 of SEQ ID NO:29, 
nucleotides 1 to 579 of SEQ ID NO:30, 
nucleotides 1 to 514 of SEQ ID NO:31, 
nucleotides 1 to 477 of SEQ ID NO:32, 
nucleotides 1 to 500 of SEQ ID NO:33, 
nucleotides 1 to 470 of SEQ ID NO:34, 
nucleotides 1 to 491 of SEQ ID NO:35, 
nucleotides 1 to 221 of SEQ ID NO:36, 
nucleotides 1 to 519 of SEQ ID NO:61, 
nucleotides 1 to 497 of SEQ ID NO:62, 
nucleotides 1 to 498 of SEQ ID NO:63, 
nucleotides 1 to 525 of SEQ ID NO:64, and 
nucleotides 1 to 951 of SEQ ID NO:67; and 
(iii) the complementary strand of the nucleotides selected from the group consisting of: 
nucleotides 1 to 200 of SEQ ID NO:1, 
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nucleotides 1 to 200 of SEQ ID NO:3, 

nucleotides 1 to 200 of SEQ ID NO:5, 

nucleotides 1 to 200 of SEQ ID NO:7, 

nucleotides 1 to 200 of SEQ ID NO:9, 
5 nucleotides 1 to 200 of SEQ ID NO:1 1 , 

nucleotides 1 to 200 of SEQ ID NO: 13, 

nucleotides 1 to 200 of SEQ ID NO: 15, 

nucleotides 1 to 200 of SEQ ID NO:37, 

nucleotides 1 to 200 of SEQ ID NO:39, 
10 nucleotides 1 to 200 of SEQ ID NO:41 , 

nucleotides 1 to 200 of SEQ ID NO:43 t 

nucleotides 1 to 200 of SEQ ID NO:45, 

nucleotides 1 to 200 of SEQ ID NO:47, 

nucleotides 1 to 200 of SEQ ID NO:49, 
15 nucleotides 1 to 200 of SEQ ID NO:51 , 

nucleotides 1 to 200 of SEQ ID NO:53, 

nucleotides 1 to 200 of SEQ ID NO:55, 

nucleotides 1 to 200 of SEQ ID NO:57, 

nucleotides 1 to 200 of SEQ ID NO:59 t and 
20 nucleotides 1 to 200 of SEQ ID NO:65; and 

(e) a fragment of (a), (b) or (c) that has cellobiohydrolase I activity. 

In a second aspect the present invention relates to a polynucleotide having a nucleotide 
25 sequence which encodes for the polypeptide of the invention. 

In a third aspect the present invention relates to a nucleic acid construct comprising the 
nucleotide sequence, which encodes for the polypeptide of the invention, operably linked to 
one or more control sequences that direct the production of the polypeptide in a suitable host. 
In a fourth aspect the present invention relates to a recombinant expression vector 
30 comprising the nucleic acid construct of the invention. 

In a fifth aspect the present invention relates to a recombinant host cell comprising the 
nucleic acid construct of the invention. 

In a sixth aspect the present invention relates to a method for producing a polypeptide of 
the invention, the method comprising: 
35 (a) cultivating a strain, which in its wild-type form is capable of producing the 

polypeptide, to produce the polypeptide; and 
(b) recovering the polypeptide. 

9 
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In a seventh aspect the present invention relates to a method for producing a 
polypeptide of the invention, the method comprising: 

(a) cultivating a recombinant host cell of the invention under conditions conducive for 
production of the polypeptide; and 
5 (b) recovering the polypeptide. 

In an eight aspect the present invention relates to a method for in-situ production of a 
polypeptide of the invention, the method comprising: 

(a) cultivating a recombinant host cell of the invention under conditions conducive for 
production of the polypeptide; and 
10 (b) contacting the polypeptide with a desired substrate without prior recovery of the 

polypeptide. 

Other aspects of the present invention will be apparent from the below description and 
from the appended claims. 

15 

Definitions 

Prior to discussing the present invention in further details, the following terms and 
conventions will first be defined: 

Substantially pure polypeptide: In the present context, the term "substantially pure 

20 polypeptide" means a polypeptide preparation which contains at the most 10% by weight of 
other polypeptide material with which it is natively associated (lower percentages of other 
polypeptide material are preferred, e.g. at the most 8% by weight, at the most 6% by weight, 
at the most 5% by weight, at the most 4% at the most 3% by weight, at the most 2% by 
weight, at the most 1% by weight, and at the most by weight). Thus, it is preferred that 

25 the substantially pure polypeptide is at least 92% pure, i.e. that the polypeptide constitutes at 
least 92% by weight of the total polypeptide material present in the preparation, and higher 
percentages are preferred such as at least 94% pure, at least 95% pure, at least 96% pure, at 
least 96% pure, at least 97% pure, at least 98% pure, at least 99%, and at the most 99.5% 
pure. The polypeptides disclosed herein are preferably in a substantially pure form. In 

30 particular, it is preferred that the polypeptides disclosed herein are in "essentially pure form", 
i.e. that the polypeptide preparation is essentially free of other polypeptide material with which 
it is natively associated. This can be accomplished, for example, by preparing the polypeptide 
by means of well-known recombinant methods. Herein, the term "substantially pure 
polypeptide" is synonymous with the terms "isolated polypeptide" and "polypeptide in isolated 

35 form". 

Cellobiohydrolase I activity: The term "cellobiohydrolase I activity" is defined herein as a 
cellulose 1 ,4-beta-cellobiosidase (also referred to as Exo-glucanase, Exo-cellobiohydrolase or 

10 
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1,4-beta-celIobiohydrolase) activity, as defined in the enzyme class EC 3.2.1.91, which 
catalyzes the hydrolysis of 1 ,4-beta-D-glucosidic linkages in cellulose and cellotetraose, 
releasing cellobiose from the non-reducing ends of the chains. 

For purposes of the present invention, cellobiohydrolase I activity may be determined 
5 according to the procedure described in Example 2. 

In an embodiment, cellobiohydrolase I activity may be determined according to the 
procedure described in Deshpande MV et al. t Methods in Enzymology, pp. 126-130 (1988): 
"Selective Assay for Exo-1,4-Beta-Glucanases fl . According to this procedure, one unit of 
cellobiohydrolase I activity (agluconic bond cleavage activity) is defined as 1.0 \imo\e of p- 

10 nitrophenol produced per minute at 50°C, pH 5.0. 

The polypeptides of the present invention should preferably have at least 20% of the 
cellobiohydrolase I activity of a polypeptide consisting of an amino acid sequence selected 
from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ 
ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO;38, SEQ ID NO:40, 

15 SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID 
NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, and SEQ ID NO:66. 
In a particular preferred embodiment, the polypeptides should have at least 40%, such as at 
least 50%, preferably at least 60%, such as at least 70%, more preferably at least 80%, such 
as at least 90%, most preferably at least 95%, such as about or at least 100% of the 

20 cellobiohydrolase I activity of the polypeptide consisting of the amino acid sequence selected 
from the group consisting of amino acids 1 to 526 of SEQ ID NO:2, amino acids 1 to 529 of 
SEQ ID NO:4, amino acids 1 to 451 of SEQ ID NO:6, amino acids 1 to 457 of SEQ ID NO:8, 
amino acids 1 to 538 of SEQ ID NO: 10, amino acids 1 to 415 of SEQ ID NO: 12, amino acids 1 
to 447 of SEQ ID NO:14, amino acids 1 to 452 of SEQ ID NO:16, amino acids 1 to 454 of SEQ 

25 ID NO:38, amino acids 1 to 458 of SEQ ID NO:40, amino acids 1 to 450 of SEQ ID NO:42, 
amino acids 1 to 446 of SEQ ID N0.44, amino acids 1 to 527 of SEQ ID NO:46, amino acids 1 
to 455 of SEQ ID NO:48, amino acids 1 to 464 of SEQ ID NO:50, amino acids 1 to 460 of SEQ 
ID NO:52, amino acids 1 to 450 of SEQ ID NO:54, amino acids 1 to 532 of SEQ ID NO:56, 
amino acids 1 to 460 of SEQ ID NO:58, amino acids 1 to 525 of SEQ ID NO:60, and amino 

30 acids 1 to 456 of SEQ ID NO:66. 

Identity: In the present context, the homology between two amino acid sequences or 
between two nucleotide sequences is described by the parameter "identity". 

For purposes of the present invention, the degree of identity between two amino acid 
sequences is determined by using the program FASTA included in version 2.0x of the FASTA 

35 program package (see W. R. Pearson and D. J. Lipman (1988), "Improved Tools for Biological 
Sequence Analysis", PNAS 85:2444-2448; and W. R. Pearson (1990) "Rapid and Sensitive 
Sequence Comparison with FASTP and FASTA", Methods in Enzymology 183:63-98). The 
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scoring matrix used was BLOSUM50, gap penalty was -12, and gap extension penalty was -2. 

The degree of identity between two nucleotide sequences is determined using the same 
algorithm and software package as described above. The scoring matrix used was the identity 
matrix, gap penalty was -16, and gap extension penalty was -4. 
5 Fragment: When used herein, a "fragment" of a sequence selected from the group 

consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ 
ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, 
SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48. SEQ ID NO:50, SEQ ID NO:52, SEQ ID 
NO:54, SEQ ID NO:56. SEQ ID NO:58, SEQ ID NO:60. and SEQ ID NO:66 is a polypeptide 

10 having one or more amino acids deleted from the amino and/or carboxyl terminus of this 
amino acid sequence. Preferably, a fragment is a polypeptide having the amino acid 
sequence deleted corresponding to the "cellulose-binding domain" and/or the "linker domain" 
of Trichoderma reesei cellobiohydrolase I as described in SWISS-PROT accession number 
P00725. More preferably, a fragment comprises the amino acid sequence corresponding to 

1 5 the "catalytic domain" of Trichoderma reesei cellobiohydrolase I as described in SWISS-PROT 
accession number P00725. Most preferably, a fragment contains at least 434 amino add 
residues, e.g., the amino acid residues selected from the group consisting of amino adds 1 to 
434 of SEQ ID NO:2, amino acids 1 to 434 of SEQ ID NO:4, amino acids 1 to 434 of SEQ ID 
NO:6, amino acids 1 to 434 of SEQ ID NO:8, amino acids 1 to 434 of SEQ ID NO: 10, amino 

20 acids 1 to 434 of SEQ ID NO: 14, amino acids 1 to 434 of SEQ ID NO: 16, amino adds 1 to 434 
of SEQ ID NO:38, amino acids 1 to 434 of SEQ ID NO:40, amino acids 1 to 434 of SEQ ID 
NO:42, amino acids 1 to 434 of SEQ ID NO:44, amino acids 1 to 434 of SEQ ID NO:46, amino 
acids 1 to 434 of SEQ ID NO:48, amino acids 1 to 434 of SEQ ID NO:50, amino acids 1 to 434 
of SEQ ID NO:52, amino acids 1 to 434 of SEQ ID N0.54, amino acids 1 to 434 of SEQ ID 

25 NO:56, amino acids 1 to 434 of SEQ ID NO:58, amino acids 1 to 434 of SEQ ID NO:60, and 
amino acids 1 to 434 of SEQ ID NO:66. In particular, a fragment contains at least 215 amino 
acid residues, e.g., the amino add residues selected from the group consisting of amino acids 
200 to 434 of SEQ ID NO:2, amino acids 200 to 434 of SEQ ID NO:4, amino acids 200 to 434 
of SEQ ID NO:6, amino acids 200 to 434 of SEQ ID NO:8, amino adds 200 to 434 of SEQ ID 

30 NO:10, amino adds 200 to 415 of SEQ ID NO:12, amino acids 200 to 434 of SEQ ID NO:14 t 
amino acids 200 to 434 of SEQ ID NO:16, amino acids 200 to 434 of SEQ ID NO:38, amino 
adds 200 to 434 of SEQ ID NO:40, amino acids 200 to 434 of SEQ ID NO:42, amino acids 
200 to 434 of SEQ ID NO:44, amino acids 200 to 434 of SEQ ID NO:46, amino acids 200 to 
434 of SEQ ID NO:48, amino acids 200 to 434 of SEQ ID NO:50, amino acids 200 to 434 of 

35 SEQ ID NO:52, amino acids 200 to 434 of SEQ ID NO:54, amino acids 200 to 434 of SEQ ID 
NO:56, amino adds 200 to 434 of SEQ ID NO:58, amino acids 200 to 434 of SEQ ID NO:60, 
and amino acids 200 to 434 of SEQ ID NO:66. 
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Allelic variant: In the present context, the term "allelic variant" denotes any of two or 
more alternative forms of a gene occupying the same chromosomal locus. Allelic variation 
arises naturally through mutation, and may result in polymorphism within populations. Gene 
mutations can be silent (no change in the encoded polypeptide) or may encode polypeptides 
5 having altered amino acid sequences. An allelic variant of a polypeptide is a polypeptide 
encoded by an allelic variant of a gene. 

Substantially pure polynucleotide: The term "substantially pure polynucleotide" as used 
herein refers to a polynucleotide preparation, wherein the polynucleotide has been removed 
from its natural genetic milieu, and is thus free of other extraneous or unwanted coding 

10 sequences and is in a form suitable for use within genetically engineered protein production 
systems. Thus, a substantially pure polynucleotide contains at the most 10% by weight of 
other polynucleotide material with which it is natively associated (lower percentages of other 
polynucleotide material are preferred, e.g. at the most 8% by weight, at the most 6% by 
weight, at the most 5% by weight, at the most 4% at the most 3% by weight, at the most 2% 

15 by weight, at the most 1% by weight, and at the most 14% by weight). A substantially pure 
polynucleotide may, however, include naturally occurring 5' and 3' untranslated regions, such 
as promoters and terminators. It is preferred that the substantially pure polynucleotide is at 
least 92% pure, i.e. that the polynucleotide constitutes at least 92% by weight of the total 
polynucleotide material present in the preparation, and higher percentages are preferred such 

20 as at least 94% pure, at least 95% pure, at least 96% pure, at least 96% pure, at least 97% 
pure, at least 98% pure, at least 99%, and at the most 99.5% pure. The polynucleotides 
disclosed herein are preferably in a substantially pure form. In particular, it is preferred that the 
polynucleotides disclosed herein are in "essentially pure form", i.e. that the polynucleotide 
preparation is essentially free of other polynucleotide material with which it is natively 

25 associated. Herein, the term "substantially pure polynucleotide" is synonymous with the terms 
"isolated polynucleotide" and "polynucleotide in isolated form". 

Modification(s): In the context of the present invention the term "modification^)" is 
intended to mean any chemical modification of a polypeptide consisting of an amino acid 
sequence selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, 

30 SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID 
NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ 
ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, 
and SEQ ID NO:66, as well as genetic manipulation of the DNA encoding that polypeptide. 
The modification(s) can be replacement(s) of the amino acid side chain(s), substitution(s), 

35 deletion(s) and/or insertions(s) in or at the amino acid(s) of interest. 

Artificial variant: When used herein, the term "artificial variant" means a polypeptide 
having cellobiohydrolase I activity, which has been produced by an organism which is 

13 
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expressing a modified gene as compared to SEQ ID NO:1, SEQ ID NO:3. SEQ ID NO:5, SEQ 
ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID N0.37, SEQ 
ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, 
SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, or SEQ ID 
5 NO:65. The modified gene, from which said variant is produced when expressed in a suitable 
host, is obtained through human intervention by modification of a nucleotide sequence 
selected from the group consisting of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID 
NO:7 t SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:37, SEQ ID 
NO:39, SEQ ID NO:41, SEQ ID NO:43 f SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ 
10 ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, and SEQ ID 
NO:65. 

cDNA: The term "cDNA" when used in the present context, is intended to cover a DNA 
molecule which can be prepared by reverse transcription from a mature, spliced, mRNA 
molecule derived from a eukaryotic cell. cDNA lacks the intron sequences that are usually 

15 present in the corresponding genomic DNA. The initial, primary RNA transcript is a precursor 
to mRNA and it goes through a series of processing events before appearing as mature 
spliced mRNA. These events include the removal of intron sequences by a process called 
splicing. When cDNA is derived from mRNA it therefore lacks intron sequences. 

Nucleic acid construct: When used herein, the term "nucleic acid construct" means a 

20 nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally 
occurring gene or which has been modified to contain segments of nucleic acids in a manner 
that would not otherwise exist in nature. The term nucleic acid construct is synonymous with 
the term "expression cassette" when the nucleic acid construct contains the control sequences 
required for expression of a coding sequence of the present invention. 

25 Control sequence: The term "control sequences" is defined herein to include all 

components, which are necessary or advantageous for the expression of a polypeptide of the 
present invention. Each control sequence may be native or foreign to the nucleotide 
sequence encoding the polypeptide. Such control sequences include, but are not limited to, a 
leader, polyadenylation sequence, propeptide sequence, promoter, signal peptide sequence, 

30 and transcription terminator. At a minimum, the control sequences include a promoter, and 
transcriptional and translational stop signals. The control sequences may be provided with 
linkers for the purpose of introducing specific restriction sites facilitating ligation of the control 
sequences with the coding region of the nucleotide sequence encoding a polypeptide. 

Qperably linked: The term "operably linked" is defined herein as a configuration in which 

35 a control sequence is appropriately placed at a position relative to the coding sequence of the 
DNA sequence such that the control sequence directs the expression of a polypeptide. 

Coding sequence: When used herein the term "coding sequence" is intended to cover a 
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nucleotide sequence, which directly specifies the amino acid sequence of its protein product. 
The boundaries of the coding sequence are generally determined by an open reading frame, 
which usually begins with the ATG start codon. The coding sequence typically include DNA, 
cDNA, and recombinant nucleotide sequences. 
5 Expression: In the present context, the term "expression" includes any step involved in 

the production of the polypeptide including, but not limited to, transcription, post-transcriptional 
modification, translation, post-translational modification, and secretion. 

Expression vector : In the present context, the term "expression vector" covers a DNA 
molecule, linear or circular, that comprises a segment encoding a polypeptide of the invention, 
10 and which is operably linked to additional segments that provide for its transcription. 

Host cell: The term "host cell", as used herein, includes any cell type which is 
susceptible to transformation with a nucleic acid construct. 

The terms "polynucleotide probe", "hybridization" as well as the various stringency 
conditions are defined in the section entitled "Polypeptides Having Cellobiohydrolase I 
15 Activity". 

Thermostability: The term "thermostability", as used herein, is measured as described in 
Example 2. 

Detailed Description of the Invention 

20 

Polypeptides Having Cellobiohydrolase I Activity 

In a first embodiment, the present invention relates to polypeptides having 
cellobiohydrolase I activity and where the polypeptides comprises, preferably consists of, an 
amino acid sequence which has a degree of identity to an amino acid sequence selected from 

25 the group consisting of SEQ ID NO:2. SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID 
NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ 
ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, 
SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, and SEQ ID NO:66, (/.e., the 
mature polypeptide) of at least 65%, preferably at least 70%, e.g. at least 75%, more 

30 preferably at least 80%, such as at least 85%, even more preferably at least 90%, most 
preferably at least 95%, e.g. at least 96%, such as at least 97%, and even most preferably at 
least 98%, such as at least 99% (hereinafter "homologous polypeptides"). In an interesting 
embodiment, the amino acid sequence differs by at the most ten amino acids (e.g. by ten 
amino acids), in particular by at the most five amino acids (e.g. by five amino acids), such as 

35 by at the most four amino acids (e.g. by four amino acids), e.g. by at the most three amino 
acids (e.g. by three amino acids) from an amino acid sequence selected from the group 
consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ 

15 
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ID NO:12, SEQ ID NO:14, SEQ ID N0:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, 
SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID 
NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, and SEQ ID NO:66. In a particular 
interesting embodiment, the amino acid sequence differs by at the most two amino acids (e.g. 
5 by two amino acids), such as by one amino acid from an amino acid sequence selected from 
the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID 
NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ 
ID NO:42, SEQ ID NO:44. SEQ ID NO:46. SEQ ID NO:48, SEQ ID NO:50, SEQ ID NQ:52, 
SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58. SEQ ID NO:60, and SEQ ID NO:66. 

10 Preferably, the polypeptides of the present invention comprise an amino acid sequence 

selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID 
NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ 
ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, 
SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, and SEQ ID 

IS NO:66; an allelic variant thereof; or a fragment thereof that has cellobiohydrolase I activity. In 
another preferred embodiment, the polypeptide of the present invention consists of an amino 
acid sequence selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID 
NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID 
NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO;48, SEQ 

20 ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, 
and SEQ ID NO:66. 

The polypeptide of the invention may be a wild-type cellobiohydrolase I identified and 
isolated from a natural source. Such wild-type polypeptides may be specifically screened for 
by standard techniques known in the art, such as molecular screening as described in 

25 Example 1. Furthermore, the polypeptide of the invention may be prepared by the DMA 
shuffling technique, such as described in J.E. Ness et al. Nature Biotechnology 17, 893-896 
(1999). Moreover, the polypeptide of the invention may be an artificial variant which comprises, 
preferably consists of, an amino acid sequence that has at least one substitution, deletion 
and/or insertion of an amino acid as compared to an amino acid sequence selected from the 

30 group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, 
SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID 
NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ 
ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, and SEQ ID NO:66. Such artificial 
variants may be constructed by standard techniques known in the art, such as by site- 

35 directed/random mutagenesis of the polypeptide comprising an amino acid sequence selected 
from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ 
ID NO:10. SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40. 
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SEQ ID NO:42. SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID 
NO:52, SEQ ID NO:54, SEQ ID NO:56. SEQ ID NO:58 t SEQ ID NO:60, and SEQ ID NO:66. 
In one embodiment of the invention, amino acid changes (in the artificial variant as well as in 
wild-type polypeptides) are of a minor nature, that is conservative amino acid substitutions that 
5 do not significantly affect the folding and/or activity of the protein; small deletions, typically of 
one to about 30 amino acids; small amino- or carboxyl-terminal extensions, such as an amino- 
terminal methionine residue; a small linker peptide of up to about 20-25 residues; or a small 
extension that facilitates purification by changing net charge or another function, such as a 
poly-histidine tract, an antigenic epitope or a binding domain. 

10 Examples of conservative substitutions are within the group of basic amino acids 

(arginine, lysine and histidine), acidic amino acids (glutamic acid and aspartic acid), polar 
amino acids (glutamine and asparagine), hydrophobic amino acids (leucine, isoleucine, valine 
and methionine), aromatic amino acids (phenylalanine, tryptophan and tyrosine), and small 
amino acids (glycine, alanine, serine and threonine). Amino acid substitutions which do not 

15 generally alter the specific activity are known in the art and are described, for example, by H. 
Neurath and R.L Hill, 1979, In, The Proteins, Academic Press, New York. The most 
commonly occurring exchanges are Ala/Ser, Val/lle, Asp/Glu, Thr/Ser, Ala/Gly, Ala/Thr, 
Ser/Asn, AlaA/al, Ser/Gly, Tyr/Phe, Ala/Pro, Lys/Arg, Asp/Asn, Leu/lle, LeuA/al, Ala/Glu, and 
Asp/Gly as well as these in reverse. 

20 In an interesting embodiment of the invention, the amino acid changes are of such a 

nature that the physico-chemical properties of the polypeptides are altered. For example, 
amino acid changes may be performed, which improve the thermal stability of the polypeptide, 
which alter the substrate specificity, which changes the pH optimum, and the like. 

Preferably, the number of such substitutions, deletions and/or insertions as compared to 

25 an amino acid sequence selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, 
SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, 
SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID 
NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ 
ID NO:60, and SEQ ID NO:66 is at the most 10, such as at the most 9, e.g. at the most 8, 

30 more preferably at the most 7, e.g. at the most 6, such as at the most 5, most preferably at the 
most 4, e.g. at the most 3, such as at the most 2, in particular at the most 1. 

The present inventors have isolated nucleotide sequences encoding polypeptides having 
cellobiohydrolase I activity from the microorganisms selected from the group consisting of 
Acremonium thermophilum, Chaetomium thermophilum, Scytalidium sp., Scytalidium 

35 thermophilum, Thermoascus aurantiacus, Thielavia australiensis, Verticillium tenerum, 
Melanocarpus albomyces, Poitrasia circinans, Coprinus cinereus, Trichothecium roseum, 
Humicola nigrescens, Cladorrhinum foecundissimum, Diplodia gossypina, Myceliophthora 



WO 03/000941 PCTVDK02/00429 

thermophila, Rhizomucor pusillus, Meripilus giganteus, Exidia glandulosa, Xylaria hypoxylon, 
Trichophaea saccata, Acremonium sp., Chaetomium sp., Chaetomidium pingtungium, 
Myceliophthora thermophila, Myceliophthora hinnulea, Sporotrichum pruinosum, Thielavia cf 
micrvspora, Aspergillus sp., Scopulariopsis sp., Fusarium sp. f Verticillium sp., 
5 Pseudoplectania nigrella, and Phytophthora infestans; and from the gut of the termite larvae 
Neotermes castaneus. Thus, in a second embodiment, the present invention relates to 
polypeptides comprising an amino acid sequence which has at least 65% identity with the 
polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
present in an organism selected from the group consisting of Acremonium thermophilum, 

10 Chaetomium thermophilum, Scytalidium sp., Scytalidium thermophilum, Thermoascus 
aurantiacus, Thielavia australiensis, Verticillium tenerum, Neotermes castaneus, 
Melanocarpus albomyces, Poitrasia circinans, Coprinus cinereus, Trichothecium roseum IFO 
5372, Humicola nigrescens CBS 819.73, Cladorrhinum foecundissimum CBS 427.97, Diplodia 
gossypina CBS 247.96, Myceliophthora thermophila CBS 117.65, Rhizomucor pusillus CBS 

15 109471, Meripilus giganteus CBS 521.95, Exidia glandulosa CBS 2377.96, Xylaria hypoxylon 
CBS 284.96, Trichophaea saccata CBS 804.70, Acremonium sp., Chaetomium sp., 
Chaetomidium pingtungium, Myceliophthora thermophila, Myceliophthora hinnulea, 
Sporotrichum pruinosum % Thielavia cf. microspora, Aspergillus sp., Scopulariopsis sp., 
Fusarium sp., Verticillium sp., Pseudoplectania nigrella, and Phytophthora infestans. In an 

20 interesting embodiment of the invention, the polypeptide comprises an amino acid sequence 
which has at least 70%, e.g. at least 75%, preferably at least 80%, such as at least 85%, more 
preferably at least 90%, most preferably at least 95%, e.g. at least 96%, such as at least 97%, 
and even most preferably at least 98%, such as at least 99% identity with the polypeptide 
encoded by the cellobiohydrolase I encoding part of the nucleotide sequence present in an 

25 organism selected from the group consisting of Acremonium thermophilum, Chaetomium 
thermophilum, Scytalidium sp., Scytalidium thermophilum, Thermoascus aurantiacus, 
Thielavia australiensis, Verticillium tenerum, Neotermes castaneus, Melanocarpus albomyces, 
Poitrasia circinans, Coprinus cinereus, Trichothecium roseum IFO 5372, Humicola nigrescens 
CBS 819.73, Cladorrhinum foecundissimum CBS 427.97, Diplodia gossypina CBS 247.96, 

30 Myceliophthora thermophila CBS 117.65, Rhizomucor pusillus CBS 109471, Meripilus 
giganteus CBS 521.95, Exidia glandulosa CBS 2377.96, Xylaria hypoxylon CBS 284.96, 
Trichophaea saccata CBS 804.70, Acremonium sp., Chaetomium sp., Chaetomidium 
pingtungium, Myceliophthora thermophila, Myceliophthora hinnulea, Sporotrichum pruinosum, 
Thielavia cf microspora, Aspergillus sp., Scopulariopsis sp., Fusarium sp., Verticillium sp., 

35 Pseudoplectania nigrella, and Phytophthora infestans (hereinafter "homologous 
polypeptides"). In an interesting embodiment, the amino acid sequence differs by at the most 
ten amino acids (e.g. by ten amino acids), in particular by at the most five amino acids (e.g. by 
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five amino acids), such as by at the most four amino acids (e.g. by four amino acids), e.g. by 
at the most three amino acids (e.g. by three amino acids) from the polypeptide encoded by the 
cellobiohydrolase I encoding part of the nucleotide sequence present in an organism selected 
from the group consisting of Acremonium thermophilum, Chaetomium thermophilum, 
5 Scytalidium sp., Scytalidium thermophilum, Thermoascus aurantiacus, Thielavia australiensis, 
Verticillium tenerum t Neotermes castaneus, Melanocarpus albomyces t Poitrasia circinans, 
Coprinus cinereus, Trichothecium roseum IFO 5372, Humicola nigrescens CBS 819.73, 
Cladorrhinum foecundissimum CBS 427.97, Diplodia gossypina CBS 247.96, Myceliophthora 
thermophila CBS 117.65, Rhizomucor pusillus CBS 109471, Mehpilus giganteus CBS 521.95, 

10 Exidia glandulosa CBS 2377.96, Xylaria hypoxylon CBS 284.96, Trichophaea saccata CBS 
804.70, Acremonium sp., Chaetomium sp., Chaetomidium pingtungium, Myceliophthora 
thermophila, Myceliophthora hinnulea, Sporotrichum pruinosum, Thielavia cf. microspore, 
Aspergillus sp., Scopulariopsis sp., Fusarium sp., Verticillium sp., Pseudoplectania nigrella, 
and Phytophthora infestans. In a particular interesting embodiment, the amino acid sequence 

15 differs by at the most two amino acids (e.g. by two amino acids), such as by one amino acid 
from the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide 
sequence present in an organism selected from the group consisting of Acremonium 
thermophilum, Chaetomium thermophilum, Scytalidium sp., Scytalidium thermophilum, 
Thermoascus aurantiacus, Thielavia australiensis, Verticillium tenerum, Neotermes castaneus, 

20 Melanocarpus albomyces, Poitrasia circinans, Cophnus cinereus, Trichothecium roseum IFO 
5372, Humicola nigrescens CBS 819.73, Cladorrhinum foecundissimum CBS 427.97, Diplodia 
gossypina CBS 247.96, Myceliophthora thermophila CBS 117.65, Rhizomucor pusillus CBS 
109471, Mehpilus giganteus CBS 521.95, Exidia glandulosa CBS 2377.96, Xylaria hypoxylon 
CBS 284.96, Trichophaea saccata CBS 804.70, Acremonium sp., Chaetomium sp., 

25 Chaetomidium pingtungium, Myceliophthora thermophila, Myceliophthora hinnulea, 
Sporotrichum pruinosum, Thielavia cf. microspora, Aspergillus sp., Scopulariopsis sp., 
Fusarium sp., Verticillium sp., Pseudoplectania nigrella, and Phytophthora infestans. 

Preferably, the polypeptides of the present invention comprise the amino acid sequence 
of the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide 

30 sequence inserted into a plasmid present in a deposited microorganism selected from the 
group consisting of CGMCC No. 0584, CGMCC No. 0581, CGMCC No. 0585, CGMCC No. 
0582, CGMCC No. 0583, CBS 109513, DSM 14348, CGMCC No. 0580, DSM 15064, DSM 
15065, DSM 15066, DSM 15067, CGMCC No. 0747, CGMCC No. 0748, CGMCC No. 0749, 
and CGMCC No. 0750. In another preferred embodiment, the polypeptide of the present 

35 invention consists of the amino acid sequence of the polypeptide encoded by the 
cellobiohydrolase I encoding part of the nucleotide sequence inserted into a plasmid present in 
a deposited microorganism selected from the group consisting of CGMCC No. 0584, CGMCC 
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No. 0581, CGMCC No. 0585, CGMCC No. 0582, CGMCC No. 0583, CBS 109513, DSM 

14348, and CGMCC No. 0580, DSM 15064, DSM 15065, DSM 15066, DSM 15067, CGMCC 

No. 0747, CGMCC No. 0748, CGMCC No. 0749, and CGMCC No. 0750. 

In a similar way as described above, the polypeptide of the invention may be an artificial 
5 variant which comprises, preferably consists of, an amino acid sequence that has at least one 

substitution, deletion and/or insertion of an amino acid as compared to the amino acid 

sequence encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 

inserted into a plasmid present in a deposited microorganism selected from the group 

consisting of CGMCC No. 0584, CGMCC No. 0581, CGMCC No. 0585, CGMCC No. 0582, 
10 CGMCC No. 0583. CBS 109513, DSM 14348, and CGMCC No. 0580, DSM 15064, DSM 

15065, DSM 15066, DSM 15067, CGMCC No. 0747, CGMCC No. 0748, CGMCC No. 0749, 

and CGMCC No. 0750. 

In a third embodiment, the present invention relates to polypeptides having 

cellobiohydrolase I activity which are encoded by nucleotide sequences which hybridize under 
15 very low stringency conditions, preferably under low stringency conditions, more preferably 

under medium stringency conditions, more preferably under medium-high stringency 

conditions, even more preferably under high stringency conditions, and most preferably under 

very high stringency conditions with a polynucleotide probe selected from the group consisting 

of (i) the complementary strand of the nucleotides selected from the group consisting of: 
20 nucleotides 1 to 1 578 of SEQ ID NO:1 , 

nucleotides 1 to 1587 of SEQ ID NO:3, 

nucleotides 1 to 1353 of SEQ ID NO:5, 

nucleotides 1 to 1371 of SEQ ID NO:7, 

nucleotides 1 to 1614 of SEQ ID NO:9. 
25 nucleotides 1 to 1 245 of SEQ ID NO: 1 1 . 

nucleotides 1 to 1341 of SEQ ID NO:13. 

nucleotides 1 to 1356 of SEQ ID NO: 15, 

nucleotides 1 to 1365 of SEQ ID NO:37, 

nucleotides 1 to 1377 of SEQ ID NO:39. 
30 nucleotides 1 to 1 353 of SEQ ID NO:41 , 

nucleotides 1 to 1341 of SEQ ID NO:43, 

nucleotides 1 to 1584 of SEQ ID NO:45, 

nucleotides 1 to 1368 of SEQ ID NO:47, 

nucleotides 1 to 1395 of SEQ ID NO:49, 
35 nucleotides 1 to 1383 of SEQ ID NO:51, 

nucleotides 1 to 1353 of SEQ ID NO:53, 

nucleotides 1 to 1599 of SEQ ID NO:55, 
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nucleotides 1 to 1383 of SEQ ID NO:57, 
nucleotides 1 to 1578 of SEQ ID NO:59, and 
nucleotides 1 to 1371 of SEQ ID NO:65; 

(ii) the complementary strand of the nucleotides selected from the group consisting of 
5 nucleotides 1 to 500 of SEQ I D NO: 1 , 

nucleotides 1 to 500 of SEQ ID NO:3, 

nucleotides 1 to 500 of SEQ ID NO:5, 

nucleotides 1 to 500 of SEQ ID NO:7, 

nucleotides 1 to 500 of SEQ ID NO:9, 
1 0 nucleotides 1 to 500 of SEQ I D NO: 1 1 , 

nucleotides 1 to 500 of SEQ ID NO: 13, 

nucleotides 1 to 500 of SEQ ID NO: 15, 

nucleotides 1 to 500 of SEQ ID NO:37, 

nucleotides 1 to 500 of SEQ ID NO.39, 
1 5 nucleotides 1 to 500 of SEQ ID NO:41 , 

nucleotides 1 to 500 of SEQ ID NO:43, 

nucleotides 1 to 500 of SEQ ID NO:45, 

nucleotides 1 to 500 of SEQ ID NO:47, 

nucleotides 1 to 500 of SEQ ID NO:49, 
20 nucleotides 1 to 500 of SEQ ID NO:51 , 

nucleotides 1 to 500 of SEQ ID NO:53, 

nucleotides 1 to 500 of SEQ ID NO:55, 

nucleotides 1 to 500 of SEQ ID NO:57, 

nucleotides 1 to 500 of SEQ ID NO:59, 
25 nucleotides 1 to 500 of SEQ ID NO:65, 

nucleotides 1 to 221 of SEQ ID NO: 17, 

nucleotides 1 to 239 of SEQ ID NO: 18, 

nucleotides 1 to 199 of SEQ ID NO: 19, 

nucleotides 1 to 191 of SEQ ID NO:20, 
30 nucleotides 1 to 232 of SEQ ID NO:21 , 

nucleotides 1 to 467 of SEQ ID NO:22, 

nucleotides 1 to 534 of SEQ ID NO:23, 

nucleotides 1 to 563 of SEQ ID NO 24, 

nucleotides 1 to 218 of SEQ ID NO:25, 
35 nucleotides 1 to 492 of SEQ ID NO:26, 

nucleotides 1 to 481 of SEQ ID NO:27, 

nucleotides 1 to 463 of SEQ ID NO:28, 
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nucleotides 1 to 513 of SEQ ID NO:29, 
nucleotides 1 to 579 of SEQ ID NO:30, 
nucleotides 1 to 514 of SEQ ID NO:31, 
nucleotides 1 to 477 of SEQ ID NO:32, 
5 nucleotides 1 to 500 of SEQ ID NO:33, 
nucleotides 1 to 470 of SEQ ID NO:34, 
nucleotides 1 to 491 of SEQ ID NO:35, 
nucleotides 1 to 221 of SEQ ID NO:36, 
nucleotides 1 to 519 of SEQ ID NO.61, 
10 nucleotides 1 to 497 of SEQ ID NO:62, 
nucleotides 1 to 498 of SEQ ID NO:63, 
nucleotides 1 to 525 of SEQ ID NO:64, and 
nucleotides 1 to 951 of SEQ ID NO:67; and 

(Hi) the complementary strand of the nucleotides selected from the group consisting of 
1 5 nucleotides 1 to 200 of SEQ ID NO: 1 , 

nucleotides 1 to 200 of SEQ ID NO:3, 

nucleotides 1 to 200 of SEQ ID NO:5, 

nucleotides 1 to 200 of SEQ ID NO:7, 

nucleotides 1 to 200 of SEQ ID NO:9, 
20 nucleotides 1 to 200 of SEQ ID NO: 1 1 , 

nucleotides 1 to 200 of SEQ ID NO: 13, 

nucleotides 1 to 200 of SEQ ID NO: 15, 

nucleotides 1 to 200 of SEQ ID NO:37, 

nucleotides 1 to 200 of SEQ ID NO:39, 
25 nucleotides 1 to 200 of SEQ ID NO:41 , 

nucleotides 1 to 200 of SEQ ID NO:43, 

nucleotides 1 to 200 of SEQ ID NO:45, 

nucleotides 1 to 200 of SEQ ID NO:47, 

nucleotides 1 to 200 of SEQ ID NO:49, 
30 nucleotides 1 to 200 of SEQ ID NO:51 , 

nucleotides 1 to 200 of SEQ ID NO: 53, 

nucleotides 1 to 200 of SEQ ID NO:55, 

nucleotides 1 to 200 of SEQ ID NO:57, 

nucleotides 1 to 200 of SEQ ID NO:59, and 
35 nucleotides 1 to 200 of SEQ ID NO:65 

(J. Sambrook, E.F. Fritsch, and T. Maniatus, 1989, Molecular Cloning, A Laboratory Manual, 

2d edition, Cold Spring Harbor, New York). 

22 



WO 03/000941 PCTYDK02/00429 

In another embodiment, the present invention relates to polypeptides having 
cellobiohydrolase I activity which are encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in a microorganism selected from the group consisting of: 
a microorganism belonging to Zygomycota, preferably belonging to the Mucorales, more 
5 preferably belonging to the family Mucoraceae, most preferably belonging to the genus 
Rhizomucor (e.g. Rhizomucor pusillus), or the family Choanephoraceae, most preferably 
belonging to the genus Poitrasia (e.g. Poitrasia circinans), 

a microorganism belonging to the Oomycetes, preferably to the order Pythiales, more 
preferably to the family Pythiaceae, most preferably to the genus Phytophthora (e.g. 
1 0 Phytophthora infestans), 

a microorganism belonging to Auriculariales (an order of the Basidiomycota, 
Hymenomycetes), preferably belonging to the family Exidiaceae, more preferably belonging to 
the genus Exidia (e.g. Exidia glandulosa), 

a microorganism belonging to Xylariales (an order of the Ascomycota, Sordariomycetes), 
15 preferably belonging to the family Xylariaceae, more preferably belonging to the genus Xylaha 
(e.g. Xylaria hypoxylon), 

a microorganism belonging to Dothideales (an order of the Ascomycota, Dothideomycetes), 
preferably belonging to the family Dothideaceae, more preferably belonging to the genus 
Diplodia (e.g. Diplodia gossypina), 
20 a microorganism belonging to Pezizales (an order of the Ascomycota), preferably belonging to 
the family Pyronemataceae, more preferably belonging to the genus Trichophaea (e.g. 
Thchophaea saccata), or the family Sarcosomataceae, more preferably belonging to the 
genus Pseudoplectania (e.g. Pseudoplectania nigrella), 

a microorganism belonging to the family Rigidiporaceae (under Basidiomycota, 
25 Hymenomycetes, Hymenomycetales), more preferably belonging to the genus Meripilus (e.g. 
Meripilus giganteus), 

a microorganism belonging to the family Meruliaceae (under Basidiomycota, Hymenomycetes, 
Sterealesales), more preferably belonging to the genus Sporothrichum {Sporothrichum sp.), 
a microorganism belonging to the family Agancaceae (under Basidiomycota, Hymenomycetes, 
30 Agaricales) t more preferably belonging to the genus Coprinus (e.g. Coprinus cinereus), 

a microorganism belonging to the family Hypocreaceae (under Ascomycota, Sordariomycetes, 
Hypocreales), more preferably belonging to the genus Acremonium (e.g. Acremonium 
thermophilum\ Acremonium sp.) or the (mitosporic) genus Verticillium (e.g. Verticillium 
tenerum), 

35 a microorganism belonging to the genus Cladorrhinum (under Ascomycota, Sordanomycetes, 
Sordariales, Sordariaceae) e.g. Cladorrhinum foecundissimum, 

a microorganism belonging to the genus Myceliophthora (under Ascomycota, 
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Sordahomycetes t Sordariales, Sordariaceae) e.g. Myceliophthora thermophila or 
Myceliophthora hinnulae, 

a microorganism belonging to the genus Chaetomium (under Ascomycota, Sordariomycetes, 
Sordariales, Chaetomiaceae) e.g. Chaetomium thermophilum, 
5 a microorganism belonging to the genus Chaetomidium (under Ascomycota, Sordariomycetes, 
Sordariales, Chaetomiaceae) e.g. Chaetomidium pingtungium, 

a microorganism belonging to the genus Thielavia (under Ascomycota, Sordariomycetes, 
Sordariales, Chaetomiaceae) e.g. Thielavia australiensis or Thielavia microspora, 
a microorganism belonging to the genus Thermoascus (under Ascomycota, Eurotiomycetes, 
10 Eurotiales, Trichocomoaceae) e.g. Thermoascus aurantiacus, 

a microorganism belonging to the genus Trichothecium (mitosporic Ascomycota) e.g. 
Trichothecium roseum, and 

a microorganism belonging to the species Humicola nigrescens. 

A nucleotide sequence selected from the group consisting of SEQ ID NO:1, SEQ ID 

15 NO:3, SEQ ID NO:5, SEQ ID NO:7 ( SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID 
NO:15, SEQ ID NO:37 t SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ 
ID NO:47, SEQ ID NO:49, SEQ ID NO:51. SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, 
SEQ ID NO:59, SEQ ID NO:65, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, SEQ ID 
NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ 

20 ID NO:26, SEQ ID NO:27, SEQ ID NO:28. SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, 
SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID 
NO:61, SEQ ID NO:62, SEQ ID NO:63, SEQ ID NO:64, and SEQ ID NO:67, or a 
subsequence thereof, as well as an amino acid sequence selected from the group consisting 
of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6. SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, 

25 SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID 
NO:44, SEQ ID NO:46 t SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ 
ID NO:56, SEQ ID NO:58, SEQ ID NO:60, and SEQ ID NO:66, or a fragment thereof, may be 
used to design a polynucleotide probe to identify and clone DMA encoding polypeptides having 
cellobiohydrolase I activity from strains of different genera or species according to methods 

30 well known in the art. In particular, such probes can be used for hybridization with the 
genomic or cDNA of the genus or species of interest, following standard Southern blotting 
procedures, in order to identify and isolate the corresponding gene therein. Such probes can 
be considerably shorter than the entire sequence, but should be at least 15, preferably at least 
25, more preferably at least 35 nucleotides in length, such as at least 70 nucleotides in length. 

35 It is, however, preferred that the polynucleotide probe is at least 100 nucleotides in length. For 
example, the polynucleotide probe may be at least 200 nucleotides in length, at least 300 
nucleotides in length, at least 400 nucleotides in length or at least 500 nucleotides in length. 
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Even longer probes may be used, e.g., polynucleotide probes which are at least 600 
nucleotides in length, at least 700 nucleotides in length, at least 800 nucleotides in length, or 
at least 900 nucleotides in length. Both DNA and RNA probes can be used. The probes are 
typically labeled for detecting the corresponding gene (for example, with 32 P, 3 H, 35 S, biotin, or 
5 avidin). 

Thus, a genomic DNA or cDNA library prepared from such other organisms may be 

screened for DNA which hybridizes with the probes described above and which encodes a 

polypeptide having cellobiohydrolase I activity. Genomic or other DNA from such other 

organisms may be separated by agarose or polyacrylamide gel electrophoresis, or other 
10 separation techniques. DNA from the libraries or the separated DNA may be transferred to, 

and immobilized, on nitrocellulose or other suitable carrier materials. In order to identify a 

clone or DNA which is homologous with SEQ ID NO:1 the carrier material with the immobilized 

DNA is used in a Southern blot. 

For purposes of the present invention, hybridization indicates that the nucleotide 
15 sequence hybridizes to a labeled polynucleotide probe which hybridizes to the nucleotide 

sequence shown in SEQ ID NO:1 under very low to very high stringency conditions. 

Molecules to which the polynucleotide probe hybridizes under these conditions may be 

detected using X-ray film or by any other method known in the art. Whenever the term 

"polynucleotide probe" is used in the present context, it is to be understood that such a probe 
20 contains at least 1 5 nucleotides. 

In an interesting embodiment, the polynucleotide probe is the complementary strand of 

the nucleotides selected from the group consisting of: 

nucleotides 1 to 1578 of SEQ ID NO:1, 

nucleotides 1 to 1302 of SEQ ID NO:1, 
25 nucleotides 1 to 1587 of SEQ ID NO:3, 

nucleotides 1 to 1302 of SEQ ID NO:3, 

nucleotides 1 to 1353 of SEQ ID NO:5, 

nucleotides 1 to 1302 of SEQ ID NO:5, 

nucleotides 1 to 1371 of SEQ ID NO:7, 
30 nucleotides 1 to 1 302 of SEQ ID NO:7, 

nucleotides 1 to 1614 of SEQ ID NO:9, 

nucleotides 1 to 1302 of SEQ ID NO:9, 

nucleotides 1 to 1245 of SEQ ID NO:11, 

nucleotides 1 to 1341 of SEQ ID NO:13, 
35 nucleotides 1 to 1302 of SEQ ID NO: 13, 

nucleotides 1 to 1356 of SEQ ID NO: 15, 

nucleotides 1 to 1302 of SEQ ID NO:15, 



WO 03/000941 



PCT/DK02/00429 



nucleotides 1 to 1365 of SEQ ID NO:37, 

nucleotides 1 to 1302 of SEQ ID NO:37, 

nucleotides 1 to 1377 of SEQ ID NO:39, 

nucleotides 1 to 1302 of SEQ ID NO:39, 
5 nucleotides 1 to 1 353 of SEQ ID NO:41 , 

nucleotides 1 to 1302 of SEQ ID NO:41 , 

nucleotides 1 to 1341 of SEQ ID NO:43, 

nucleotides 1 to 1 302 of SEQ ID NO:43, 

nucleotides 1 to 1584 of SEQ ID N0.45, 
10 nucleotides 1 to 1302 of SEQ ID NO:45, 

nucleotides 1 to 1368 of SEQ ID NO:47, 

nucleotides 1 to 1302 of SEQ ID NO:47, 

nucleotides 1 to 1395 of SEQ ID NO:49, 

nucleotides 1 to 1302 of SEQ ID NO:49, 
1 5 nucleotides 1 to 1 383 of SEQ ID NO:51 , 

nucleotides 1 to 1302 of SEQ ID NO:51, 

nucleotides 1 to 1353 of SEQ ID NO:53, 

nucleotides 1 to 1302 of SEQ ID NO:53, 

nucleotides 1 to 1599 of SEQ ID NO:55, 
20 nucleotides 1 to 1 302 of SEQ ID NO:55, 

nucleotides 1 to 1383 of SEQ ID N0.57, 

nucleotides 1 to 1302 of SEQ ID NO:57, 

nucleotides 1 to 1578 of SEQ ID NO:59, 

nucleotides 1 to 1302 of SEQ ID NO: 59, 
25 nucleotides 1 to 1371 of SEQ ID NO:65, and 

nucleotides 1 to 1302 of SEQ ID NO:65; 

or the complementary strand of the nucleotides selected from the group consisting of: 

nucleotides 1 to 500 of SEQ ID NO:1 , 

nucleotides 1 to 500 of SEQ ID NO:3, 
30 nucleotides 1 to 500 of SEQ ID NO:5, 

nucleotides 1 to 500 of SEQ ID NO:7, 

nucleotides 1 to 500 of SEQ ID NO:9, 

nucleotides 1 to 500 of SEQ ID NO:11, 

nucleotides 1 to 500 of SEQ ID NO:13, 
35 nucleotides 1 to 500 of SEQ ID NO:1 5. 

nucleotides 1 to 500 of SEQ ID NO:37. 

nucleotides 1 to 500 of SEQ ID NO:39. 
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nucleotides 1 to 500 of SEQ ID NO:41, 

nucleotides 1 to 500 of SEQ ID NO:43, 

nucleotides 1 to 500 of SEQ ID NO.45, 

nucleotides 1 to 500 of SEQ ID NO:47, 
5 nucleotides 1 to 500 of SEQ ID NO:49, 

nucleotides 1 to 500 of SEQ ID NO:51, 

nucleotides 1 to 500 of SEQ ID NO:53. 

nucleotides 1 to 500 of SEQ ID NO:55, 

nucleotides 1 to 500 of SEQ ID NO:57, 
10 nucleotides 1 to 500 of SEQ ID NO:59, 

nucleotides 1 to 500 of SEQ ID NO:65, 

nucleotides 1 to 221 of SEQ ID NO:17, 

nucleotides 1 to 239 of SEQ ID NO: 18, 

nucleotides 1 to 199 of SEQ ID NO:19, 
1 5 nucleotides 1 to 191 of SEQ ID NO:20, 

nucleotides 1 to 232 of SEQ ID NO:21, 

nucleotides 1 to 467 of SEQ ID NO:22, 

nucleotides 1 to 534 of SEQ ID NO:23, 

nucleotides 1 to 563 of SEQ ID NO:24, 
20 nucleotides 1 to 218 of SEQ ID NO:25, 

nucleotides 1 to 492 of SEQ ID NO:26, 

nucleotides 1 to 481 of SEQ ID NO:27, 

nucleotides 1 to 463 of SEQ ID NO:28, 

nucleotides 1 to 513 of SEQ ID NO:29, 
25 nucleotides 1 to 579 of SEQ ID NO:30, 

nucleotides 1 to 514 of SEQ ID NO:31, 

nucleotides 1 to 477 of SEQ ID NO:32, 

nucleotides 1 to 500 of SEQ ID NO:33, 

nucleotides 1 to 470 of SEQ ID NO:34, 
30 nucleotides 1 to 491 of SEQ ID NO:35. 

nucleotides 1 to 221 of SEQ ID NO:36. 

nucleotides 1 to 519 of SEQ ID NO:61. 

nucleotides 1 to 497 of SEQ ID NO:62. 

nucleotides 1 to 498 of SEQ ID NO:63, 
35 nucleotides 1 to 525 of SEQ ID NO:64, and 

nucleotides 1 to 951 of SEQ ID NO:67; 

or the complementary strand of the nucleotides selected from the group consisting of: 
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nucleotides 1 to 200 of SEQ ID NO:64, and 
nucleotides 1 to 200 of SEQ ID NO:67. 

In another interesting embodiment, the polynucleotide probe is the complementary 
strand of the nucleotide sequence which encodes a polypeptide selected from the group 
5 consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ 
ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, 
SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50. SEQ ID NO:52, SEQ ID 
NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, and SEQ ID NO:66. In a further 
interesting embodiment, the polynucleotide probe is the complementary strand of a nucleotide 

10 sequence selected from the group consisting of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, 
SEQ ID NO:7 t SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:37, 
SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID 
NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, and 
SEQ ID NO:65. In another interesting embodiment, the polynucleotide probe is the 

15 complementary strand of the nucleotide sequence contained in a plasmid which is contained in 
a deposited microorganism selected from the group consisting of CGMCC No. 0584, CGMCC 
No. 0581, CGMCC No. 0585, CGMCC No. 0582, CGMCC No. 0583, CGMCC No. 0580, CBS 
109513, DSM 14348, DSM 15064, DSM 15065, DSM 15066, DSM 15067, CGMCC No. 0747, 
CGMCC No. 0748, CGMCC No. 0749, and CGMCC No. 0750. 

20 For long probes of at least 100 nucleotides in length, very low to very high stringency 

conditions are defined as prehybridization and hybridization at 42°C in 5X SSPE, 1.0% SDS, 
5X Denhardt's solution, 100 ^g/ml sheared and denatured salmon sperm DNA, following 
standard Southern blotting procedures. Preferably, the long probes of at least 100 nucleotides 
do not contain more than 1000 nucleotides. For long probes of at least 100 nucleotides in 

25 length, the carrier material is finally washed three times each for 15 minutes using 2 x SSC, 
0.1% SDS at 42°C (very low stringency), preferably washed three times each for 15 minutes 
using 0.5 x SSC, 0.1% SDS at 42°C (low stringency), more preferably washed three times 
each for 15 minutes using 0.2 x SSC, 0.1% SDS at 42°C (medium stringency), even more 
preferably washed three times each for 15 minutes using 0.2 x SSC, 0.1% SDS at 55°C 

30 (medium-high stringency), most preferably washed three times each for 15 minutes using 0.1 
x SSC, 0.1% SDS at 60°C (high stringency), in particular washed three times each for 15 
minutes using 0.1 x SSC, 0.1% SDS at 68°C (very high stringency). 

Although not particularly preferred, it is contemplated that shorter probes, e.g. probes 
which are from about 15 to 99 nucleotides in length, such as from about 15 to about 70 

35 nucleotides in length, may be also be used. For such short probes, stringency conditions are 
defined as prehybridization, hybridization, and washing post-hybridization at 5°C to 10°C 
below the calculated T m using the calculation according to Bolton and McCarthy (1962, 
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Proceedings of the National Academy of Sciences USA 48:1390) in 0.9 M NaCI, 0.09 M Tris- 
HCI pH 7.6, 6 mM EDTA, 0.5% NP-40, 1X Denhardt's solution, 1 mM sodium pyrophosphate, 
1 mM sodium monobasic phosphate, 0.1 mM ATP, and 0.2 mg of yeast RNA per ml following 
standard Southern blotting procedures. 
5 For short probes which are about 15 nucleotides to 99 nucleotides in length, the carrier 

material is washed once in 6X SCC plus 0.1% SDS for 15 minutes and twice each for 15 
minutes using 6X SSC at 5°C to 10°C below the calculated T m . 



Sources for Polypeptides Having Cellobiohydrolase I Activity 

10 A polypeptide of the present invention may be obtained from microorganisms of any 

genus. For purposes of the present invention, the term "obtained from 0 as used herein shall 
mean that the polypeptide encoded by the nucleotide sequence is produced by a cell in which 
the nucleotide sequence is naturally present or into which the nucleotide sequence has been 
inserted. In a preferred embodiment, the polypeptide is secreted extracellularly. 

15 A polypeptide of the present invention may be a bacterial polypeptide. For example, the 

polypeptide may be a gram positive bacterial polypeptide such as a Bacillus polypeptide, e.g., 
a Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis. Bacillus circulans, Bacillus 
coagulans, Bacillus lautus, Bacillus lentus t Bacillus licheniformis, Bacillus megaterium, Bacillus 
stearothermophilus, Bacillus subtilis, or Bacillus thuringiensis polypeptide; or a Streptomyces 

20 polypeptide, e.g., a Streptomyces lividans or Streptomyces murinus polypeptide; or a gram 
negative bacterial polypeptide, e.g., an E. coli or a Pseudomonas sp. polypeptide. 

A polypeptide of the present invention may be a fungal polypeptide, and more preferably 
a yeast polypeptide such as a Candida, Kluyveromyces, Neocatlimastix, Pichia, Piromyces, 
Saccharomyces, Schizosaccharomyces, or Yarrowia polypeptide; or more preferably a 

25 filamentous fungal polypeptide such as an Acremonium, Aspergillus, Aureobasidium t 
Cryptococcus, Filibasidium, Fusarium, Humicola, Magnaporthe, Mucor t Myceliophthora t 
Neurospora, Paecilomyces, Penicillium, Schizophyllum, Talaromyces, Thermoascus, 
Thielavia, Tofypocladium, or Trichoderma polypeptide. 

In an interesting embodiment, the polypeptide is a Saccharomyces carlsbergensis t 

30 Saccharomyces cerevisiae t Saccharomyces diastaticus, Saccharomyces douglasii, 
Saccharomyces kluyveri, Saccharomyces norbensis or Saccharomyces oviformis polypeptide. 

In another interesting embodiment, the polypeptide is an Aspergillus aculeatus, 
Aspergillus awamori, Aspergillus foetidus t Aspergillus japonicus, Aspergillus nidulans, 
Aspergillus niger, Aspergillus oryzae, Fusarium bactridioides, Fusarium cerealis, Fusarium 

35 crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium 
heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum t Fusarium 
roseum, Fusarium sambucinum t Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium 
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sulphureum, Fusarium torulosum, Fusarium trichothecioides, Fusarium venenatum, Humicola 
insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora 
crassa, Penicillium purpurogenum, Trichoderma harzianum, Trichoderma koningii, 
Thchoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride polypeptide. 
5 In a preferred embodiment, the polypeptide is a Acremonium thermophilum, 

Chaetomium thermophilum, Scytalidium sp M Scytalidium thermophilum, Thermoascus 
aurantiacus, Thielavia australiensis, Verticillium tenerum, Neotermes castaneus, 
Melanocarpus albomyces, Poitrasia circinans, Coprinus cinereus, Trichothecium roseum, 
Humicola nigrescens, Cladorrhinum foecundissimum, Diplodia gossypina, Myceliophthora 

10 thermophila, Rhizomucor pusillus, Meripilus giganteus, Exidia glandulosa, Xylaria hypoxylon, 
Trichophaea saccata, Acremonium sp., Chaetomium sp., Chaetomidium pingtungium, 
Myceliophthora thermophila, Myceliophthora hinnulea, Sporotrichum pruinosum, Thielavia cf. 
microspora, Aspergillus sp., Scopulariopsis sp., Fusarium sp., Verticillium sp., 
Pseudoplectania nigrella, or Phytophthora infestans polypeptide. 

15 In a more preferred embodiment, the polypeptide is a Acremonium thermophilum t 

Chaetomium thermophilum, Scytalidium sp., Scytalidium thermophilum f Thermoascus 
aurantiacus, Thielavia australiensis, Verticillium tenerum, Neotermes castaneus, 
Melanocarpus albomyces, Poitrasia circinans, or Coprinus cinereus polypeptide, e.g., the 
polypeptide consisting of an amino acid sequence selected from the group consisting of SEQ 

20 ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8 t SEQ ID NO:10, SEQ ID NO:12, SEQ ID 
NO:14, SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ 
ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54. SEQ ID NO:56, 
SEQ ID NO:58, SEQ ID NO:60, and SEQ ID NO:66. 

It will be understood that for the aforementioned species, the invention encompasses 

25 both the perfect and imperfect states, and other taxonomic equivalents, e.g., anamorphs, 
regardless of the species name by which they are known. Those skilled in the art will readily 
recognize the identity of appropriate equivalents. 

Strains of these species are readily accessible to the public in a number of culture 
collections, such as the American Type Culture Collection (ATCC), Deutsche Sammlung von 

30 Mikroorganismen und Zellkulturen GmbH (DSMZ), China General Microbiological Culture 
Collection Center (CGMCC), Centraalbureau Voor Schimmelcultures (CBS), and Agricultural 
Research Service Patent Culture Collection, Northern Regional Research Center (NRRL). 

Furthermore, such polypeptides may be identified and obtained from other sources 
including microorganisms isolated from nature (e.g., soil, water, plants, animals, etc.) using the 

35 above-mentioned probes. Techniques for isolating microorganisms from natural habitats are 
well known in the art. The nucleotide sequence may then be derived by similarly screening a 
genomic or cDNA library of another microorganism. Once a nucleotide sequence encoding a 
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polypeptide has been detected with the probe(s), the sequence may be isolated or cloned by 
utilizing techniques which are known to those of ordinary skill in the art (see, e.g., Sambrook et 
a/., 1989, supra). 

Polypeptides encoded by nucleotide sequences of the present invention also include 
5 fused polypeptides or cleavable fusion polypeptides in which another polypeptide is fused at 
the N-terminus or the C-terminus of the polypeptide or fragment thereof. A fused polypeptide 
is produced by fusing a nucleotide sequence (or a portion thereof) encoding another 
polypeptide to a nucleotide sequence (or a portion thereof) of the present invention. 
Techniques for producing fusion polypeptides are known in the art, and include ligating the 
10 coding sequences encoding the polypeptides so that they are in frame and that expression of 
the fused polypeptide is under control of the same promoter(s) and terminator. 

Polynucleotides and Nucleotide Sequences 

The present invention also relates to polynucleotides having a nucleotide sequence 

IS which encodes for a polypeptide of the invention. In particular, the present invention relates to 
polynucleotides consisting of a nucleotide sequence which encodes for a polypeptide of the 
invention. In a preferred embodiment, the nucleotide sequence is selected from the group 
consisting of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ 
ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, 

20 SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID 
NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, and SEQ ID NO:65. In a more 
preferred embodiment, the nucleotide sequence is the mature polypeptide coding region 
contained in a plasmid which is contained in a deposited microorganism selected from the 
group consisting of CGMCC No. 0584, CGMCC No. 0581, CGMCC No. 0585, CGMCC No. 

25 0582, CGMCC No. 0583, CGMCC No. 0580, CBS 109513, DSM 14348, DSM 15064, DSM 
15065, DSM 15066, DSM 15067, CGMCC No. 0747, CGMCC No. 0748, CGMCC No. 0749, 
and CGMCC No. 0750. The present invention also encompasses polynucleotides comprising, 
preferably consisting of, nucleotide sequences which encode a polypeptide consisting of an 
amino acid sequence selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ 

30 ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ 
ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, 
SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID 
NO:60, and SEQ ID NO:66, which differ from a nucleotide sequence selected from the group 
consisting of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ 

35 ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, 
SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID 
NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, and SEQ ID NO:65 by virtue of the 
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degeneracy of the genetic code. 

The present invention also relates to polynucleotides comprising, preferably consisting 
of, a subsequence of a nucleotide sequence selected from the group consisting of SEQ ID 
NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9. SEQ ID NO:11. SEQ ID 
5 NO:13. SEQ ID NO:15, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41. SEQ ID NO:43, SEQ 
ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, 
SEQ ID NO:57, SEQ ID NO:59, and SEQ ID NO:65 which encode fragments of an amino acid 
sequence selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, 
SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID 

10 NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ 
ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, 
and SEQ ID NO:66 that have cellobiohydrolase I activity. A subsequence of a nucleotide 
sequence selected from the group consisting of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, 
SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11. SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:37, 

15 SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID 
NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57. SEQ ID NO:59. and 
SEQ ID NO:65 is a nucleotide sequence encompassed by a sequence selected from the 
group consisting of SEQ ID NO:1. SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, 
SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:37, SEQ ID NO:39, SEQ ID 

20 NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49. SEQ ID NO:51, SEQ 
ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59. and SEQ ID NO:65 except that one 
or more nucleotides from the 5' and/or 3' end have been deleted. 

The present invention also relates to polynucleotides having, preferably consisting of, a 
modified nucleotide sequence which comprises at least one modification in the mature 

25 polypeptide coding sequence selected from the group consisting of SEQ ID NO.1, SEQ ID 
NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:1 3, SEQ ID 
NO:15, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID N0.43, SEQ ID NO:45, SEQ 
ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, 
SEQ ID NO:59, and SEQ ID NO:65, and where the modified nucleotide sequence encodes a 

30 polypeptide which consists of an amino acid sequence selected from the group consisting of 
SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, 
SEQ ID NO:14. SEQ ID NO:16, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID 
NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50. SEQ ID NO:52. SEQ ID NO:54, SEQ 
ID N0.56, SEQ ID NO:58, SEQ ID NO:60, and SEQ ID NO:66. 

35 The techniques used to isolate or clone a nucleotide sequence encoding a polypeptide 

are known in the art and include isolation from genomic DNA, preparation from cDNA, or a 
combination thereof. The cloning of the nucleotide sequences of the present invention from 
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such genomic DNA can be effected, e.g., by using the well known polymerase chain reaction 
(PCR) or antibody screening of expression libraries to detect cloned DNA fragments with 
shared structural features. See, e.g., Innis et a/., 1990, PCR: A Guide to Methods and 
Application, Academic Press, New York. Other amplification procedures such as ligase chain 
5 reaction (LCR), ligated activated transcription (LAT) and nucleotide sequence-based 
amplification (NASBA) may be used. The nucleotide sequence may be cloned from a strain 
selected from the group consisting of Acremonium, Scytalidium, Thermoascus, Thielavia, 
Verticillium, Neotermes, Melanocarpus, Poitrasia, Coprinus, Trichothecium, Humicola, 
Cladorrhinum, Diplodia t Myceliophthora, Rhizomucor, Meripilus, Exidia, Xyiaria, Trichophaea, 
10 Chaetomium, Chaetomidium, Sporotrichum, Thielavia, Aspergillus, Scopulariopsis, Fusarium, 
Pseudoplectania, and Phytophthora, or another or related organism and thus, for example, 
may be an allelic or species variant of the polypeptide encoding region of the nucleotide 
sequence. 

The nucleotide sequence may be obtained by standard cloning procedures used in 
15 genetic engineering to relocate the nucleotide sequence from its natural location to a different 
site where it will be reproduced. The cloning procedures may involve excision and isolation of 
a desired fragment comprising the nucleotide sequence encoding the polypeptide, insertion of 
the fragment into a vector molecule, and incorporation of the recombinant vector into a host 
cell where multiple copies or clones of the nucleotide sequence will be replicated. The 
20 nucleotide sequence may be of genomic, cDNA, RNA, semisynthetic, synthetic origin, or any 
combinations thereof. 

The present invention also relates to a polynucleotide comprising, preferably consisting 

of, a nucleotide sequence which has a degree of identity with a nucleotide sequence selected 

from the group consisting of 
25 nucleotides 1 to 1578 of SEQ ID NO:1, 

nucleotides 1 to 1587 of SEQ ID NO:3, 

nucleotides 1 to 1353 of SEQ ID NO:5, 

nucleotides 1 to 1371 of SEQ ID NO:7, 

nucleotides 1 to 1614 of SEQ ID NO:9, 
30 nucleotides 1 to 1 245 of SEQ ID NO: 1 1 , 

nucleotides 1 to 1341 of SEQ ID NO:13, 

nucleotides 1 to 1356 of SEQ ID NO:15, 

nucleotides 1 to 1365 of SEQ ID NO:37, 

nucleotides 1 to 1377 of SEQ ID NO:39, 
35 nucleotides 1 to 1353 of SEQ ID NO:41, 

nucleotides 1 to 1341 of SEQ ID NO:43, 

nucleotides 1 to 1584 of SEQ ID NO:45, 
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nucleotides 1 to 218 of SEQ ID NO:25, 

nucleotides 1 to 492 of SEQ ID NO:26, 

nucleotides 1 to 481 of SEQ ID NO:27, 

nucleotides 1 to 463 of SEQ ID NO:28, 
5 nucleotides 1 to 51 3 of SEQ ID NO:29, 

nucleotides 1 to 579 of SEQ ID NO:30, 

nucleotides 1 to 514 of SEQ ID NO:31, 

nucleotides 1 to 477 of SEQ ID NO:32, 

nucleotides 1 to 500 of SEQ ID NO:33, 
10 nucleotides 1 to 470 of SEQ ID NO:34, 

nucleotides 1 to 491 of SEQ ID NO:35, 

nucleotides 1 to 221 of SEQ ID NO:36, 

nucleotides 1 to 519 of SEQ ID NO:61 , 

nucleotides 1 to 497 of SEQ ID NO:62, 
15 nucleotides 1 to 498 of SEQ ID NO:63, 

nucleotides 1 to 525 of SEQ ID NO:64, and 

nucleotides 1 to 951 of SEQ ID NO:67 

of at least 70% identity, such as at least 75% identity; preferably, the nucleotide sequence has 
at least 80% identity, e.g. at least 85% identity, such as at least 90% identity, more preferably 
20 at least 95% identity, such as at least 96% identity, e.g. at least 97% identity, even more 
preferably at least 98% identity, such as at least 99%. Preferably, the nucleotide sequence 
encodes a polypeptide having cellobiohydrolase I activity. The degree of identity between two 
nucleotide sequences is determined as described previously (see the section entitled 
"Definitions"). 

25 In another interesting aspect, the present invention relates to a polynucleotide having, 

preferably consisting of, a nucleotide sequence which has at least 65% identity with the . 
cellobiohydrolase I encoding part of the nucleotide sequence inserted into a plasmid present in 
a deposited microorganism selected from the group consisting of CGMCC No. 0584, CGMCC 
No. 0581, CGMCC No. 0585, CGMCC No. 0582, CGMCC No. 0583, CGMCC No. 0580, CBS 

30 109513, DSM 14348, DSM 15064, DSM 15065, DSM 15066, DSM 15067, CGMCC No. 0747, 
CGMCC No. 0748, CGMCC No. 0749, and CGMCC No. 0750. In a preferred embodiment, the 
degree of identity with the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in a deposited microorganism selected from the group 
consisting of CGMCC No. 0584, CGMCC No. 0581, CGMCC No. 0585. CGMCC No. 0582, 

35 CGMCC No. 0583, CGMCC No. 0580, CBS 109513, DSM 14348, DSM 15064, DSM 15065, 
DSM 15066, DSM 15067, CGMCC No. 0747. CGMCC No. 0748, CGMCC No. 0749, and 
CGMCC No. 0750 is at least 70%, e.g. at least 80%, such as at least 90%, more preferably at 
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least 95%, such as at least 96%, e.g. at least 97%, even more preferably at least 98%, such 
as at least 99%. Preferably, the nucleotide sequence comprises the cellobiohydrolase I 
encoding part of the nucleotide sequence inserted into a plasmid present in a deposited 
microorganism selected from the group consisting of CGMCC No. 0584, CGMCC No. 0581, 
5 CGMCC No. 0585, CGMCC No. 0582, CGMCC No. 0583, CGMCC No. 0580, CBS 109513, 
DSM 14348, DSM 15064, DSM 15065, DSM 15066, DSM 15067, CGMCC No. 0747, CGMCC 
No. 0748, CGMCC No. 0749, and CGMCC No. 0750. In an even more preferred embodiment, 
the nucleotide sequence consists of the cellobiohydrolase I encoding part of the nucleotide 
sequence inserted into a plasmid present in a deposited microorganism selected from the 

10 group consisting of CGMCC No. 0584, CGMCC No. 0581, CGMCC No. 0585, CGMCC No. 
0582, CGMCC No. 0583, CGMCC No. 0580, CBS 109513, DSM 14348, DSM 15064, DSM 
15065, DSM 15066, DSM 15067, CGMCC No. 0747, CGMCC No. 0748, CGMCC No. 0749, 
and CGMCC No. 0750. 

Modification of a nucleotide sequence encoding a polypeptide of the present invention 

15 may be necessary for the synthesis of a polypeptide, which comprises an amino acid 
sequence that has at least one substitution, deletion and/or insertion as compared to an amino 
acid sequence selected from the group consisting of SEQ ID NO:2, SEQ ID NO:4, SEQ ID 
NO:6, SEQ ID NO:8, SEQ ID NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:16, SEQ ID 
NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ 

20 ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, 
and SEQ ID NO:66. These artificial variants may differ in some engineered way from the 
polypeptide isolated from its native source, e.g., variants that differ in specific activity, 
thermostability, pH optimum, or the like. 

It will be apparent to those skilled in the art that such modifications can be made outside 

25 the regions critical to the function of the molecule and still result in an active polypeptide. 
Amino acid residues essential to the activity of the polypeptide encoded by the nucleotide 
sequence of the invention, and therefore preferably not subject to modification, such as 
substitution, may be identified according to procedures known in the art, such as site-directed 
mutagenesis or alanine-scanning mutagenesis (see, e.g., Cunningham and Wells, 1989, 

30 Science 244: 1081-1085). In the latter technique, mutations are introduced at every positively 
charged residue in the molecule, and the resultant mutant molecules are tested for 
cellobiohydrolase I activity to identify amino acid residues that are critical to the activity of the 
molecule. Sites of substrate-enzyme interaction can also be determined by analysis of the 
three-dimensional structure as determined by such techniques as nuclear magnetic resonance 

35 analysis, crystallography or photoaffinity labelling (see, e.g., de Vos et a/., 1992, Science 255: 
306-312; Smith et a/., 1992, Journal of Molecular Biology 224: 899-904; Wlodaver et a/., 1992, 
FEBS Letters 309: 59-64). 
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Moreover, a nucleotide sequence encoding a polypeptide of the present invention may 
be modified by introduction of nucleotide substitutions which do not give rise to another amino 
acid sequence of the polypeptide encoded by the nucleotide sequence, but which correspond 
to the codon usage of the host organism intended for production of the enzyme. 
5 The introduction of a mutation into the nucleotide sequence to exchange one nucleotide 

for another nucleotide may be accomplished by site-directed mutagenesis using any of the 
methods known in the art. Particularly useful is the procedure, which utilizes a supercoiled, 
double stranded DNA vector with an insert of interest and two synthetic primers containing the 
desired mutation. The oligonucleotide primers, each complementary to opposite strands of 

10 the vector, extend during temperature cycling by means of Pfu DNA polymerase. On 
incorporation of the primers, a mutated plasmid containing staggered nicks is generated. 
Following temperature cycling, the product is treated with Dpn\ which is specific for methylated 
and hemimethylated DNA to digest the parental DNA template and to select for mutation- 
containing synthesized DNA. Other procedures known in the art may also be used. For a 

15 general description of nucleotide substitution, see, e.g., Ford et a/., 1991, Protein Expression 
and Purification 2: 95-107. 

The present invention also relates to a polynucleotide comprising, preferably consisting 
of, a nucleotide sequence which encodes a polypeptide having cellobiohydrolase I activity, and 
which hybridizes under very low stringency conditions, preferably under low stringency 

20 conditions, more preferably under medium stringency conditions, more preferably under 
medium-high stringency conditions, even more preferably under high stringency conditions, 
and most preferably under very high stringency conditions with a polynucleotide probe 
selected from the group consisting of 

(i) the complementary strand of the nucleotides selected from the group consisting of: 
25 nucleotides 1 to 1578 of SEQ ID NO:1 , 

nucleotides 1 to 1302 of SEQ ID NO:1, 

nucleotides 1 to 1587 of SEQ ID NO:3, 

nucleotides 1 to 1302 of SEQ ID NO:3, 

nucleotides 1 to 1353 of SEQ ID NO:5, 
30 nucleotides 1 to 1302 of SEQ ID NO:5, 

nucleotides 1 to 1371 of SEQ ID NO:7, 

nucleotides 1 to 1302 of SEQ ID NO:7, 

nucleotides 1 to 1614 of SEQ ID NO:9, 

nucleotides 1 to 1302 of SEQ ID NO:9, 
35 nucleotides 1 to 1245 of SEQ ID NO:11, 

nucleotides 1 to 1341 of SEQ ID NO: 13, 

nucleotides 1 to 1302 of SEQ ID NO:13, 

38 
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1 to 


1356 of SEQ ID NO: 15, 




nucleotides 


1 to 


1302 of SEQ ID NO: 15, 




nucleotides 


1 to 


1365 of SEQ IDNO:37, 




nucleotides * 


1 to 


1302 of SEQ IDNO:37, 


5 


nucleotides ' 


1 to 


1377 of SEQ ID NO:39, 




nucleotides ' 


1 to 


1302 of SEQ ID NO:39, 




nucleotides ' 


1 to 


1353 of SEQ IDNO:41, 




nucleotides ' 


1 to 


1302 of SEQ ID NO:41, 




nucleotides ' 


1 to 


1341 of SEQ ID NO:43, 


10 


nucleotides " 


1 to 


1302 of SEQ ID NO*43 




nucleotides ' 


1 to 


1584 of SEQ ID N0 45 




nucleotides ' 


1 to 


1302 of SEQ ID NO:45, 




nucleotides ' 


1 to 


1368 of SEQ ID NO:47, 




nucleotides ' 


1 to 


1302 of SEQ ID NO:47, 


15 


nucleotides " 


1 to 


1395 of SEQ ID NO:49, 




nucleotides ' 


1 to 


1302 of SEQ ID NO:49, 




nucleotides 1 


1 to 


1383 of SEQ ID NO:51, 




nucleotides 4 


to 


1302 of SEQ ID NO:51, 




nucleotides 1 


to 


1353 of SEQ ID NO:53, 


20 


nucleotides ' 


to 


1302 of SEQ ID NO:53, 




nucleotides ' 


to 


1599 of SEQ ID NO:55, 




nucleotides 4 


to 


1302 of SEQ ID NO:55, 




nucleotides ' 


to 


1383 of SEQ ID NO*57 




nucleotides ' 


to 


1302 of SEO ID NO'57 


25 


nucleotides 1 


to 


1578 of SEQ IDNO:59, 




nucleotides 1 


to 


1302 of SEQ IDNO:59, 




nucleotides 1 


to 


1371 of SEQ ID NO:65, and 




nucleotides 1 


to 


1302 of SEQ ID NO:65; 



30 



35 



nucleotides 
nucleotides 
nucleotides 
nucleotides 
nucleotides 
nucleotides 
nucleotides 
nucleotides 



ii) the complementary strand of the nucleotides selected from the group consisting of: 



to 500 of SEQ ID NO:1, 
to 500 of SEQ ID NO:3, 
to 500 of SEQ ID NO:5, 
to 500 of SEQ ID NO:7, 
to 500 of SEQ ID NO:9, 
to 500 of SEQ ID NO:11, 
to 500 of SEQ ID NO:13, 
to 500 of SEQ ID NO: 15, 
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nucleotides 1 to 500 of SEQ ID NO:37, 

nucleotides 1 to 500 of SEQ ID NO:39, 

nucleotides 1 to 500 of SEQ ID NO:41 , 

nucleotides 1 to 500 of SEQ ID NO:43, 
5 nucleotides 1 to 500 of SEQ ID NO:45. 

nucleotides 1 to 500 of SEQ ID NO:47. 

nucleotides 1 to 500 of SEQ ID NO:49, 

nucleotides 1 to 500 of SEQ ID NO:51, 

nucleotides 1 to 500 of SEQ ID NO:53, 
10 nucleotides 1 to 500 of SEQ ID NO:55, 

nucleotides 1 to 500 of SEQ ID NO:57, 

nucleotides 1 to 500 of SEQ ID NO:59, 

nucleotides 1 to 500 of SEQ ID NO:65, 

nucleotides 1 to 221 of SEQ ID NO:17, 
15 nucleotides 1 to 239 of SEQ ID NO:18, 

nucleotides 1 to 199 of SEQ ID NO:19, 

nucleotides 1 to 191 of SEQ ID NO:20. 

nucleotides 1 to 232 of SEQ ID NO:21, 

nucleotides 1 to 467 of SEQ ID NO:22, 
20 nucleotides 1 to 534 of SEQ ID NO:23, 

nucleotides 1 to 563 of SEQ ID NO:24, 

nucleotides 1 to 218 of SEQ ID NO:25, 

nucleotides 1 to 492 of SEQ ID NO:26, 

nucleotides 1 to 481 of SEQ ID NO:27, 
25 nucleotides 1 to 463 of SEQ ID NO:28, 

nucleotides 1 to 513 of SEQ ID NO:29, 

nucleotides 1 to 579 of SEQ ID NO:30, 

nucleotides 1 to 514 of SEQ ID NO:31, 

nucleotides 1 to 477 of SEQ ID NO:32. 
30 nucleotides 1 to 500 of SEQ ID NO:33. 

nucleotides 1 to 470 of SEQ ID NO:34, 

nucleotides 1 to 491 of SEQ ID NO:35, 

nucleotides 1 to 221 of SEQ ID NO:36. 

nucleotides 1 to 519 of SEQ ID NO:61, 
35 nucleotides 1 to 497 of SEQ ID NO:62, 

nucleotides 1 to 498 of SEQ ID NO:63, 

nucleotides 1 to 525 of SEQ ID NO:64, and 
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nucleotides 1 to 951 of SEQ ID NO:67; and 

(iii) the complementary strand of the nucleotides selected from the group consisting of: 

nucleotides 1 to 200 of SEQ I D NO: 1 , 

nucleotides 1 to 200 of SEQ ID NO:3, 
5 nucleotides 1 to 200 of SEQ ID NO:5. 

nucleotides 1 to 200 of SEQ ID NO. 7. 

nucleotides 1 to 200 of SEQ ID NO:9, 

nucleotides 1 to 200 of SEQ ID NO:1 1 , 

nucleotides 1 to 200 of SEQ I D NO: 1 3, 
10 nucleotides 1 to 200 of SEQ ID NO: 15, 

nucleotides 1 to 200 of SEQ ID NO:37, 

nucleotides 1 to 200 of SEQ ID NO:39, 

nucleotides 1 to 200 of SEQ ID NO:41, 

nucleotides 1 to 200 of SEQ ID NO:43, 
1 5 nucleotides 1 to 200 of SEQ ID NO:45, 

nucleotides 1 to 200 of SEQ ID NO:47, 

nucleotides 1 to 200 of SEQ ID NO:49, 

nucleotides 1 to 200 of SEQ ID NO:51, 

nucleotides 1 to 200 of SEQ ID NO:53, 
20 nucleotides 1 to 200 of SEQ ID NO:55, 

nucleotides 1 to 200 of SEQ ID NO:57, 

nucleotides 1 to 200 of SEQ ID NO:59, 

nucleotides 1 to 200 of SEQ ID NO:65, 

nucleotides 1 to 200 of SEQ ID NO:22, 
25 nucleotides 1 to 200 of SEQ ID NO:23, 

nucleotides 1 to 200 of SEQ ID NO:24, 

nucleotides 1 to 200 of SEQ ID NO:26. 

nucleotides 1 to 200 of SEQ ID NO:27, 

nucleotides 1 to 200 of SEQ ID NO:28, 
30 nucleotides 1 to 200 of SEQ ID NO:29, 

nucleotides 1 to 200 of SEQ ID NO:30, 

nucleotides 1 to 200 of SEQ ID NO:31, 

nucleotides 1 to 200 of SEQ ID NO:32, 

nucleotides 1 to 200 of SEQ ID NO:33, 
35 nucleotides 1 to 200 of SEQ ID NO:34, 

nucleotides 1 to 200 of SEQ ID NO:35, 

nucleotides 1 to 200 of SEQ ID NO:61, 
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nucleotides 1 to 200 of SEQ ID NO:62. 
nucleotides 1 to 200 of SEQ ID NO:63, 
nucleotides 1 to 200 of SEQ ID NO:64, and 
nucleotides 1 to 200 of SEQ ID NO:67. 
5 As will be understood, details and particulars concerning hybridization of the nucleotide 

sequences will be the same or analogous to the hybridization aspects discussed in the section 
entitled "Polypeptides Having Cellobiohydrolase I Activity" herein. 



Nucleic Acid Constructs 

10 The present invention also relates to nucleic acid constructs comprising a nucleotide 

sequence of the present invention operably linked to one or more control sequences that 
direct the expression of the coding sequence in a suitable host cell under conditions 
compatible with the control sequences. 

A nucleotide sequence encoding a polypeptide of the present invention may be 

15 manipulated in a variety of ways to provide for expression of the polypeptide. Manipulation of 
the nucleotide sequence prior to its insertion into a vector may be desirable or necessary 
depending on the expression vector. The techniques for modifying nucleotide sequences 
utilizing recombinant DNA methods are well known in the art. 

The control sequence may be an appropriate promoter sequence, a nucleotide 

20 sequence which is recognized by a host cell for expression of the nucleotide sequence. The 
promoter sequence contains transcriptional control sequences, which mediate the expression 
of the polypeptide. The promoter may be any nucleotide sequence which shows 
transcriptional activity in the host cell of choice including mutant, truncated, and hybrid 
promoters, and may be obtained from genes encoding extracellular or intracellular 

25 polypeptides either homologous or heterologous to the host cell. 

Examples of suitable promoters for directing the transcription of the nucleic acid 
constructs of the present invention, especially in a bacterial host cell, are the promoters 
obtained from the £ coli lac operon, Streptomyces coelicolor agarase gene (dagA), Bacillus 
subtilis levansucrase gene (sacB), Bacillus licheniformis alpha-amylase gene (amyL), Bacillus 

30 stearothermophilus maltogenic amylase gene (amyM) t Bacillus amyloliquefaciens alpha- 
amylase gene (amyQ), Bacillus licheniformis penicillinase gene (penP), Bacillus subtilis xylA 
and xylB genes, and prokaryotic beta-lactamase gene (Villa-Kamaroff ef a/., 1978, 
Proceedings of the National Academy of Sciences USA 75: 3727-3731), as well as the fac 
promoter (DeBoer et a/., 1983, Proceedings of the National Academy of Sciences USA 80: 21- 

35 25). Further promoters are described in "Useful proteins from recombinant bacteria" in 
Scientific American, 1980, 242: 74-94; and in Sambrook et a/., 1989, supra. 

Examples of suitable promoters for directing the transcription of the nucleic acid 
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constructs of the present invention in a filamentous fungal host cell are promoters obtained 
from the genes for Aspergillus oryzae TAKA amylase, Rhizomucor miehei aspartic proteinase, 
Aspergillus niger neutral alpha-amylase, Aspergillus niger acid stable alpha-amylase, 
Aspergillus niger or Aspergillus awamori glucoamylase (glaA) t Rhizomucor miehei lipase, 
5 Aspergillus oryzae alkaline protease, Aspergillus oryzae triose phosphate isomerase, 
Aspergillus nidulans acetamidase, and Fusarium oxysporum trypsin-like protease (WO 
96/00787), as well as the NA2-tpi promoter (a hybrid of the promoters from the genes for 
Aspergillus niger neutral alpha-amylase and Aspergillus oryzae triose phosphate isomerase), 
and mutant, truncated, and hybrid promoters thereof. 

10 In a yeast host, useful promoters are obtained from the genes for Saccharomyces 

cerevisiae enolase (ENO-1), Saccharomyces cerevisiae galactokinase (GAL1), 
Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate 
dehydrogenase (ADH2/GAP), and Saccharomyces cerevisiae 3-phosphoglycerate kinase. 
Other useful promoters for yeast host cells are described by Romanos et a/., 1992, Yeast 8: 

15 423-488. 

The control sequence may also be a suitable transcription terminator sequence, a 
sequence recognized by a host cell to terminate transcription. The terminator sequence is 
operably linked to the 3' terminus of the nucleotide sequence encoding the polypeptide. Any 
terminator which is functional in the host cell of choice may be used in the present invention. 
20 Preferred terminators for filamentous fungal host cells are obtained from the genes for 

Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans 
anthranilate synthase, Aspergillus niger alpha-glucosidase, and Fusarium oxysporum trypsin- 
like protease. 

Preferred terminators for yeast host cells are obtained from the genes for 
25 Saccharomyces cerevisiae enolase, Saccharomyces cerevisiae cytochrome C (CYC1), and 
Saccharomyces cerevisiae glyceraldehyde-3-phosphate dehydrogenase. Other useful 
terminators for yeast host cells are described by Romanos et a/., 1992, supra. 

The control sequence may also be a suitable leader sequence, a nontranslated region of 
an mRNA which is important for translation by the host cell. The leader sequence is operably 
30 linked to the 5' terminus of the nucleotide sequence encoding the polypeptide. Any leader 
sequence that is functional in the host cell of choice may be used in the present invention. 

Preferred leaders for filamentous fungal host cells are obtained from the genes for 
Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate isomerase. 

Suitable leaders for yeast host cells are obtained from the genes for Saccharomyces 
35 cerevisiae enolase (ENO-1), Saccharomyces cerevisiae 3-phosphoglycerate kinase, 
Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohol 
dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP). 
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The control sequence may also be a polyadenylation sequence, a sequence operably 
linked to the 3* terminus of the nucleotide sequence and which, when transcribed, is 
recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. 
Any polyadenylation sequence which is functional in the host cell of choice may be used in the 
5 present invention. 

Preferred polyadenylation sequences for filamentous fungal host cells are obtained from 
the genes for Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus 
nidulans anthranilate synthase, Fusarium oxysporum trypsin-like protease, and Aspergillus 
niger alpha-glucosidase. 

10 Useful polyadenylation sequences for yeast host cells are described by Guo and 

Sherman, 1995, Molecular Cellular Biology 15: 5983-5990. 

The control sequence may also be a signal peptide coding region that codes for an 
amino acid sequence linked to the amino terminus of a polypeptide and directs the encoded 
polypeptide into the cell's secretory pathway. The 5* end of the coding sequence of the 

15 nucleotide sequence may inherently contain a signal peptide coding region naturally linked in 
translation reading frame with the segment of the coding region which encodes the secreted 
polypeptide. Alternatively, the 5' end of the coding sequence may contain a signal peptide 
coding region which is foreign to the coding sequence. The foreign signal peptide coding 
region may be required where the coding sequence does not naturally contain a signal peptide 

20 coding region. Alternatively, the foreign signal peptide coding region may simply replace the 
natural signal peptide coding region in order to enhance secretion of the polypeptide. 
However, any signal peptide coding region which directs the expressed polypeptide into the 
secretory pathway of a host cell of choice may be used in the present invention. 

Effective signal peptide coding regions for bacterial host cells are the signal peptide 

25 coding regions obtained from the genes for Bacillus NCIB 1 1837 maltogenic amylase, Bacillus 
stearothermophilus alpha-amylase, Bacillus licheniformis subtilisin, Bacillus licheniformis beta- 
lactamase, Bacillus stearothermophilus neutral proteases (nprT, nprS, nprM), and Bacillus 
subtilis prsA. Further signal peptides are described by Simonen and Palva, 1993, 
Microbiological Reviews 57: 109-137. 

30 Effective signal peptide coding regions for filamentous fungal host cells are the signal 

peptide coding regions obtained from the genes for Aspergillus oryzae TAKA amylase, 
Aspergillus niger neutral amylase, Aspergillus niger glucoamylase, Rhizomucor miehei aspartic 
proteinase, Humicola insolens cellulase, and Humicola lanuginosa lipase. 

Useful signal peptides for yeast host cells are obtained from the genes for 

35 Saccharomyces cerevisiae alpha-factor and Saccharomyces cerevisiae invertase. Other 
useful signal peptide coding regions are described by Romanes et a/., 1992, supra. 

The control sequence may also be a propeptide coding region that codes for an amino 
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acid sequence positioned at the amino terminus of a polypeptide. The resultant polypeptide is 
known as a proenzyme or propolypeptide (or a zymogen in some cases). A propolypeptide is 
generally inactive and can be converted to a mature active polypeptide by catalytic or 
autocatalytic cleavage of the propeptide from the propolypeptide. The propeptide coding 
5 region may be obtained from the genes for Bacillus subtilis alkaline protease (aprE) t Bacillus 
subtilis neutral protease (nprT), Saccharomyces cerevisiae alpha-factor, Rhizomucor miehei 
aspartic proteinase, and Myceliophthora thermophila laccase (WO 95/33836). 

Where both signal peptide and propeptide regions are present at the amino terminus of 
a polypeptide, the propeptide region is positioned next to the amino terminus of a polypeptide 
10 and the signal peptide region is positioned next to the amino terminus of the propeptide 
region. 

It may also be desirable to add regulatory sequences which allow the regulation of the 
expression of the polypeptide relative to the growth of the host cell. Examples of regulatory 
systems are those which cause the expression of the gene to be turned on or off in response 

15 to a chemical or physical stimulus, including the presence of a regulatory compound. 
Regulatory systems in prokaryotic systems include the lac, tac, and trp operator systems. In 
yeast, the ADH2 system or GAL1 system may be used. In filamentous fungi, the TAKA alpha- 
amylase promoter, Aspergillus niger glucoamylase promoter, and Aspergillus oryzae 
glucoamylase promoter may be used as regulatory sequences. Other examples of regulatory 

20 sequences are those which allow for gene amplification. In eukaryotic systems, these include 
the dihydrofolate reductase gene which is amplified in the presence of methotrexate, and the 
metallothionein genes which are amplified with heavy metals. In these cases, the nucleotide 
sequence encoding the polypeptide would be operably linked with the regulatory sequence. 



25 Expression Vectors 

The present invention also relates to recombinant expression vectors comprising the 
nucleic acid construct of the invention. The various nucleotide and control sequences 
described above may be joined together to produce a recombinant expression vector which 
may include one or more convenient restriction sites to allow for insertion or substitution of the 

30 nucleotide sequence encoding the polypeptide at such sites. Alternatively, the nucleotide 
sequence of the present invention may be expressed by inserting the nucleotide sequence or 
a nucleic acid construct comprising the sequence into an appropriate vector for expression. In 
creating the expression vector, the coding sequence is located in the vector so that the coding 
sequence is operably linked with the appropriate control sequences for expression. 

35 The recombinant expression vector may be any vector (e.g., a plasmid or virus) which 

can be conveniently subjected to recombinant DNA procedures and can bring about the 
expression of the nucleotide sequence. The choice of the vector will typically depend on the 
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compatibility of the vector with the host cell into which the vector is to be introduced. The 
vectors may be linear or closed circular plasmids. 

The vector may be an autonomously replicating vector, i.e., a vector which exists as an 
extrachromosomal entity, the replication of which is independent of chromosomal replication, 
5 e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial 
chromosome. 

The vector may contain any means for assuring self-replication. Alternatively, the vector 
may be one which, when introduced into the host cell, is integrated into the genome and 
replicated together with the chromosome(s) into which it has been integrated. Furthermore, a 

10 single vector or plasmid or two or more vectors or plasmids which together contain the total 
DNA to be introduced into the genome of the host cell, or a transposon may be used. 

The vectors of the present invention preferably contain one or more selectable markers 
which permit easy selection of transformed cells. A selectable marker is a gene the product of 
which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to 

15 auxotrophs, and the like. 

Examples of bacterial selectable markers are the dal genes from Bacillus subtilis or 
Bacillus licheniformis, or markers which confer antibiotic resistance such as ampicillin, 
kanamycin, chloramphenicol or tetracycline resistance. Suitable markers for yeast host cells 
are ADE2, HIS3, LEU2, LYS2, MET3, TRP1, and URA3. Selectable markers for use in a 

20 filamentous fungal host cell include, but are not limited to, amdS (acetamidase), argB 
(ornithine carbamoyltransferase), bar (phosphinothricin acetyltransferase), hygB (hygromycin 
phosphotransferase), niaD (nitrate reductase), pyrG (orotidine-5'-phosphate decarboxylase), 
sC (sulfate adenyltransferase), trpC (anthranilate synthase), as well as equivalents thereof. 
Preferred for use in an Aspergillus cell are the amdS and pyrG genes of Aspergillus 

25 nidulans or Aspergillus oryzae and the bar gene of Streptomyces hygroscopicus. 

The vectors of the present invention preferably contain an element(s) that permits stable 
integration of the vector into the host cell's genome or autonomous replication of the vector in 
the cell independent of the genome. 

For integration into the host cell genome, the vector may rely on the nucleotide 

30 sequence encoding the polypeptide or any other element of the vector for stable integration of 
the vector into the genome by homologous or nonhomologous recombination. Alternatively, 
the vector may contain additional nucleotide sequences for directing integration by 
homologous recombination into the genome of the host cell. The additional nucleotide 
sequences enable the vector to be integrated into the host cell genome at a precise location(s) 

35 in the chromosome(s). To increase the likelihood of integration at a precise location, the 
integrational elements should preferably contain a sufficient number of nucleotides, such as 
100 to 1,500 base pairs, preferably 400 to 1,500 base pairs, and most preferably 800 to 1,500 
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base pairs, which are highly homologous with the corresponding target sequence to enhance 
the probability of homologous recombination. The integrational elements may be any 
sequence that is homologous with the target sequence in the genome of the host cell. 
Furthermore, the integrational elements may be non-encoding or encoding nucleotide 
5 sequences. On the other hand, the vector may be integrated into the genome of the host cell 
by non-homologous recombination. 

For autonomous replication, the vector may further comprise an origin of replication 
enabling the vector to replicate autonomously in the host cell in question. Examples of 
bacterial origins of replication are the origins of replication of plasmids pBR322, pUC19, 

10 pACYC177, and pACYC184 permitting replication in E. co//, and pUB110, pE194, pTA1060, 
and pAMB1 permitting replication in Bacillus, Examples of origins of replication for use in a 
yeast host cell are the 2 micron origin of replication, ARS1, ARS4, the combination of ARS1 
and CEN3, and the combination of ARS4 and CEN6. The origin of replication may be one 
having a mutation which makes its functioning temperature-sensitive in the host cell (see, e.g., 

15 Ehrlich, 1 978, Proceedings of the National Academy of Sciences USA 75: 1433). 

More than one copy of a nucleotide sequence of the present invention may be inserted 
into the host cell to increase production of the gene product. An increase in the copy number 
of the nucleotide sequence can be obtained by integrating at least one additional copy of the 
sequence into the host cell genome or by including an amplifiable selectable marker gene with 

20 the nucleotide sequence where cells containing amplified copies of the selectable marker 
gene, and thereby additional copies of the nucleotide sequence, can be selected for by 
cultivating the cells in the presence of the appropriate selectable agent. 

The procedures used to ligate the elements described above to construct the 
recombinant expression vectors of the present invention are well known to one skilled in the 

25 art (see, e.g., Sambrook et a/., 1989, supra). 

Host Cells 

The present invention also relates to recombinant a host cell comprising the nucleic acid 
construct of the invention, which are advantageously used in the recombinant production of 

30 the polypeptides. A vector comprising a nucleotide sequence of the present invention is 
introduced into a host cell so that the vector is maintained as a chromosomal integrant or as a 
self-replicating extra-chromosomal vector as described earlier. 

The host cell may be a unicellular microorganism, e.g., a prokaryote, or a non-unicellular 
microorganism, e.g., a eukaryote. 

35 Useful unicellular cells are bacterial cells such as gram positive bacteria including, but 

not limited to, a Bacillus cell, e.g., Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus 
brews, Bacillus circulans, Bacillus clausii, Bacillus coagulans t Bacillus lautus, Bacillus lentus, 
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Bacillus licheniformis, Bacillus megaterium, Bacillus stearothermophilus, Bacillus subtilis, and 
Bacillus thuringiensis', or a Streptomyces cell, e.g., Streptomyces lividans or Streptomyces 
murinus, or gram negative bacteria such as £ coli and Pseudomonas sp. In a preferred 
embodiment, the bacterial host cell is a Bacillus lentus, Bacillus licheniformis, Bacillus 
5 stearothermophilus, or Bacillus subtilis cell. In another preferred embodiment, the Bacillus cell 
is an alkalophilic Bacillus. 

The introduction of a vector into a bacterial host cell may, for instance, be effected by 
protoplast transformation (see, e.g., Chang and Cohen, 1979, Molecular General Genetics 
168: 111-115), using competent cells (see, e.g., Young and Spizizin, 1961, Journal of 
10 Bacteriology 81: 823-829, or Dubnau and Davidoff-Abelson, 1971, Journal of Molecular 
Biology 56: 209-221), electroporation (see, e.g., Shigekawa and Dower, 1988, Biotechniques 
6: 742-751), or conjugation (see, e.g., Koehlerand Thome, 1987, Journal of Bacteriology 169: 
5771-5278). 

The host cell may be a eukaryote, such as a mammalian, insect, plant, or fungal cell. 

15 In a preferred embodiment, the host cell is a fungal cell. "Fungi" as used herein includes 

the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota (as defined by 
Hawksworth ef a/., In, Ainsworth and Bisb/s Dictionary of The Fungi, 8th edition, 1995, CAB 
International, University Press, Cambridge, UK) as well as the Oomycota (as cited in 
Hawksworth ef a/., 1995, supra, page 171) and all mitosporic fungi (Hawksworth ef a/., 1995, 

20 supra). 

In a more preferred embodiment, the fungal host cell is a yeast cell. "Yeast" as used 
herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and 
yeast belonging to the Fungi Imperfecti (Blastomycetes). Since the classification of yeast may 
change in the future, for the purposes of this invention, yeast shall be defined as described in 
25 Biology and Activities of Yeast (Skinner, FA, Passmore, S.M., and Davenport, R.R., eds, 
Soc. App. BactehoL Symposium Series No. 9, 1980). 

In an even more preferred embodiment, the yeast host cell is a Candida, Aschbyii, 
Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell. 

In a most preferred embodiment, the yeast host cell is a Saccharomyces cartsbergensis, 
30 Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, 
Saccharomyces kluyveh, Saccharomyces norbensis or Saccharomyces oviformis cell. In 
another most preferred embodiment, the yeast host cell is a Kluyveromyces lactis cell. In 
another most preferred embodiment, the yeast host cell is a Yarrowia lipolytics cell. 

In another more preferred embodiment, the fungal host cell is a filamentous fungal cell. 
35 "Filamentous fungi" include all filamentous forms of the subdivision Eumycota and Oomycota 
(as defined by Hawksworth et ai, 1995, supra). The filamentous fungi are characterized by a 
mycelial wall composed of chitin, cellulose, glucan, chitosan, mannan, and other complex 
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polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is 
obligately aerobic. In contrast, vegetative growth by yeasts such as Saccharomyces 
cerevisiae is by budding of a unicellular thallus and carbon catabolism may be fermentative. 
In an even more preferred embodiment, the filamentous fungal host cell is a cell of a 
5 species of, but not limited to, Acremonium, Aspergillus, Fusarium, Humicola, Mucor, 
Myceliophthora, Neurospora, Penicillium, Thielavia, Tolypocfadium, or Trichoderma. 

In a most preferred embodiment, the filamentous fungal host cell is an Aspergillus 
awamori, Aspergillus foetidus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger or 
Aspergillus oryzae cell. In another most preferred embodiment, the filamentous fungal host 

10 cell is a Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium 
culmorum, Fusarium graminearum, Fusarium graminum t Fusarium heterosporum, Fusarium 
negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium 
sambucinum t Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, 
Fusarium torulosum, Fusarium trichothecioides, or Fusarium venenatum cell. In an even most 

15 preferred embodiment, the filamentous fungal parent cell is a Fusarium venenatum (Nirenberg 
sp. nov.) cell. In another most preferred embodiment, the filamentous fungal host cell is a 
Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, 
Neurospora crassa, Penicillium purpurogenum, Thielavia terrestris, Trichoderma harzianum, 
Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma 

20 viride cell. 

Fungal cells may be transformed by a process involving protoplast formation, 
transformation of the protoplasts, and regeneration of the cell wall in a manner known per se. 
Suitable procedures for transformation of Aspergillus host cells are described in EP 238 023 
and Yelton et a/., 1984, Proceedings of the National Academy of Sciences USA 81: 1470- 

25 1474. Suitable methods for transforming Fusarium species are described by Malardier et ai, 
1989, Gene 78: 147-156 and WO 96/00787. Yeast may be transformed using the procedures 
described by Becker and Guarente, In Abelson, J.N. and Simon, M.I., editors, Guide to Yeast 
Genetics and Molecular Biology, Methods in Enzymology, Volume 194, pp 182-187, Academic 
Press, Inc., New York; Ito et a/., 1983, Journal of Bacteriology 153: 163; and Hinnen et a/., 

30 1 978, Proceedings of the National Academy of Sciences USA 75: 1 920. 



Methods of Production 

The present invention also relates to methods for producing a polypeptide of the present 
invention comprising (a) cultivating a strain, which in its wild-type form is capable of producing 
35 the polypeptide; and (b) recovering the polypeptide. Preferably, the strain is selected from the 
group consisting of Acremonium, Scytalidium, Thermoascus, Thielavia, Verticillium, 
Neotermes, Melanocarpus, Poitrasia, Coprinus, Trichothecium t Humicola, Cladorrhinum, 
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Diplodia, Myceliophthora, Rhizomucor, Meripilus, Exidia, Xylaria, Trichophaea, Chaetomium, 
Chaetomidium, Sporotrichum, Thielavia, Aspergillus, Scopulariopsis, Fusarium, 
Pseudoplectania, and Phytophthora: more preferably the strain is selected from the group 
consisting of Acremonium thermophilum, Chaetomium thermophilum, Scytalidium 
5 thermophilum, Thermoascus aurantiacus, Thielavia australiensis, Verticillium tenerum, 
Neotermes castaneus, Melanocarpus albomyces, Poitrasia circinans, Coprinus cinereus, 
Trichothecium roseum, Humicola nigrescens, Cladorrhinum foecundissimum, Diplodia 
gossypina, Myceliophthora thermophila, Rhizomucor pusillus, Meripilus giganteus, Exidia 
glandulosa, Xylaria hypoxylon, Trichophaea saccata, Chaetomidium pingtungium, 

10 Myceliophthora thermophila, Myceliophthora hinnulea, Sporotrichum pruinosum, Thielavia cl 
microspora, Pseudoplectania nigrella, and Phytophthora infestans. 

The present invention also relates to methods for producing a polypeptide of the present 
invention comprising (a) cultivating a host cell under conditions conducive for production of the 
polypeptide; and (b) recovering the polypeptide. 

15 The present invention also relates to methods for in-situ production of a polypeptide of 

the present invention comprising (a) cultivating a host cell under conditions conducive for 
production of the polypeptide; and (b) contacting the polypeptide with a desired substrate, 
such as a cellulosic substrate, without prior recovery of the polypeptide. The term "in-situ 
production" is intended to mean that the polypeptide is produced directly in the locus in which 

20 it is intended to be used, such as in a fermentation process for production of ethanol. 

In the production methods of the present invention, the cells are cultivated in a nutrient 
medium suitable for production of the polypeptide using methods known in the art. For 
example, the cell may be cultivated by shake flask cultivation, small-scale or large-scale 
fermentation (including continuous, batch, fed-batch, or solid state fermentations) in laboratory 

25 or industrial fermentors performed in a suitable medium and under conditions allowing the 
polypeptide to be expressed and/or isolated. The cultivation takes place in a suitable nutrient 
medium comprising carbon and nitrogen sources and inorganic salts, using procedures known 
in the art. Suitable media are available from commercial suppliers or may be prepared 
according to published compositions (e.g., in catalogues of the American Type Culture 

30 Collection). If the polypeptide is secreted into the nutrient medium, the polypeptide can be 
recovered directly from the medium. If the polypeptide is not secreted, it can be recovered 
from cell lysates. 

The polypeptides may be detected using methods known in the art that are specific for 
the polypeptides. These detection methods may include use of specific antibodies, formation 
35 of an enzyme product, or disappearance of an enzyme substrate. For example, an enzyme 
assay may be used to determine the activity of the polypeptide as described herein. 

The resulting polypeptide may be recovered by methods known in the art. For example, 
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the polypeptide may be recovered from the nutrient medium by conventional procedures 
including, but not limited to, centrifugation, filtration, extraction, spray-drying, evaporation, or 
precipitation. 

The polypeptides of the present invention may be purified by a variety of procedures 1 
5 known in the art including, but not limited to, chromatography (e.g., ion exchange, affinity, 
hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g., 
preparative isoelectric focusing), differential solubility (e.g., ammonium sulfate precipitation), 
SDS-PAGE, or extraction (see, e.g., Protein Purification, J.-C. Janson and Lars Ryden, 
editors, VCH Publishers, New York, 1989). 

10 

Plants 

The present invention also relates to a transgenic plant, plant part, or plant cell which 
has been transformed with a nucleotide sequence encoding a polypeptide having 
cellobiohydrolase I activity of the present invention so as to express and produce the 
15 polypeptide in recoverable quantities. The polypeptide may be recovered from the plant or 
plant part. Alternatively, the plant or plant part containing the recombinant polypeptide may be 
used as such for improving the quality of a food or feed, e.g., improving nutritional value, 
payability, and rheological properties, or to destroy an antinutritive factor. 

The transgenic plant can be dicotyledonous (a dicot) or monocotyledonous (a monocot). 
20 Examples of monocot plants are grasses, such as meadow grass (blue grass, Poa), forage 
grass such as Festuca, Lolium, temperate grass, such as Agrostis, and cereals, e.g., wheat, 
oats, rye, barley, rice, sorghum, millets, and maize (corn). 

Examples of dicot plants are tobacco, lupins, potato, sugar beet, legumes, such as pea, 
bean and soybean, and cruciferous plants (family Brassicaceae), such as cauliflower, rape, 
25 canola, and the closely related model organism Arabidopsis thaliana. 

Examples of plant parts are stem, callus, leaves, root, fruits, seeds, and tubers. Also 
specific plant tissues, such as chloroplast, apoplast, mitochondria, vacuole, peroxisomes, and 
cytoplasm are considered to be a plant part. Furthermore, any plant cell, whatever the tissue 
origin, is considered to be a plant part. 
30 Also included within the scope of the present invention are the progeny (clonal or seed) 

of such plants, plant parts and plant cells. 

The transgenic plant or plant cell expressing a polypeptide of the present invention may 
be constructed in accordance with methods known in the art. Briefly, the plant or plant cell is 
constructed by incorporating one or more expression constructs encoding a polypeptide of the 
35 present invention into the plant host genome and propagating the resulting modified plant or 
plant cell into a transgenic plant or plant cell. 

Conveniently, the expression construct is a nucleic acid construct which comprises a 
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nucleotide sequence encoding a polypeptide of the present invention operably linked with 
appropriate regulatory sequences required for expression of the nucleotide sequence in the 
plant or plant part of choice. Furthermore, the expression construct may comprise a 
selectable marker useful for identifying host cells into which the expression construct has been 
5 integrated and DNA sequences necessary for introduction of the construct into the plant in 
question (the latter depends on the DNA introduction method to be used). 

The choice of regulatory sequences, such as promoter and terminator sequences and 
optionally signal or transit sequences is determined, for example, on the basis of when, where, 
and how the polypeptide is desired to be expressed. For instance, the expression of the gene 

10 encoding a polypeptide of the present invention may be constitutive or inducible, or may be 
developmental, stage or tissue specific, and the gene product may be targeted to a specific 
tissue or plant part such as seeds or leaves. Regulatory sequences are, for example, 
described by Tague et a/., 1988, Plant Physiology 86: 506. 

For constitutive expression, the 35S-CaMV promoter may be used (Franck et a/., 1980, 

15 Cell 21: 285-294). Organ-specific promoters may be, for example, a promoter from storage 
sink tissues such as seeds, potato tubers, and fruits (Edwards & Coruzzi, 1990, Ann. Rev. 
Genet 24: 275-303), or from metabolic sink tissues such as meristems (Ito et a/., 1994, Plant 
Mol. Biol. 24: 863-878), a seed specific promoter such as the glutelin, prolamin, globulin, or 
albumin promoter from rice (Wu ef a/., 1998, Plant and Cell Physiology 39: 885-889), a Vicia 

20 faba promoter from the legumin B4 and the unknown seed protein gene from Vicia faba 
(Conrad et a/., 1998, Journal of Plant Physiology 152: 708-711), a promoter from a seed oil 
body protein (Chen ef a/., 1998, Plant and Cell Physiology 39: 935-941), the storage protein 
napA promoter from Brassica napus, or any other seed specific promoter known in the art, 
e.g., as described in WO 91/14772. Furthermore, the promoter may be a leaf specific 

25 promoter such as the rbcs promoter from rice or tomato (Kyozuka ef a/., 1993, Plant 
Physiology 102: 991-1000, the chlorella virus adenine methyltransferase gene promoter (Mitra 
and Higgins, 1994, Plant Molecular Biology 26: 85-93), or the aldP gene promoter from rice 
(Kagaya ef a/., 1995, Molecular and General Genetics 248: 668-674), or a wound inducible 
promoter such as the potato pin2 promoter (Xu ef a/., 1993, Plant Molecular Biology 22: 573- 

30 588). 

A promoter enhancer element may also be used to achieve higher expression of the 
enzyme in the plant. For instance, the promoter enhancer element may be an intron which is 
placed between the promoter and the nucleotide sequence encoding a polypeptide of the 
present invention. For instance, Xu ef a/., 1993, supra disclose the use of the first intron of the 
35 rice actin 1 gene to enhance expression. 

The selectable marker gene and any other parts of the expression construct may be 
chosen from those available in the art. 
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The nucleic acid construct is incorporated into the plant genome according to 
conventional techniques known in the art, including Agrobacterium-mediated transformation, 
virus-mediated transformation, microinjection, particle bombardment, biolistic transformation, 
and electroporation (Gasser ef aA, 1990, Science 244: 1293; Potrykus, 1990, Bio/Technology 
5 8: 535; Shimamoto ef a/., 1989, Nature 338: 274). 

Presently, Agrobacterium tumefaciens-medlaled gene transfer is the method of choice 
for generating transgenic dicots (for a review, see Hooykas and Schilperoort, 1992, Plant 
Molecular Biology 19: 15-38). However it can also be used for transforming monocots, 
although other transformation methods are generally preferred for these plants. Presently, the 
10 method of choice for generating transgenic monocots is particle bombardment (microscopic 
gold or tungsten particles coated with the transforming DNA) of embryonic calli or developing 
embryos (Christou, 1992, Plant Journal 2: 275-281; Shimamoto, 1994, Current Opinion 
Biotechnology 5: 158-162; Vasil et a/., 1992, Bio/Technology 10: 667-674). An alternative 
method for transformation of monocots is based on protoplast transformation as described by 
1 5 Omirulleh et a/. , 1 993, Plant Molecular Biology 21:41 5-428. 

Following transformation, the transformants having incorporated therein the expression 
construct are selected and regenerated into whole plants according to methods well-known in 
the art. 

The present invention also relates to methods for producing a polypeptide of the present 
20 invention comprising (a) cultivating a transgenic plant or a plant cell comprising a nucleotide 
sequence encoding a polypeptide having cellobiohydrolase I activity of the present invention 
under conditions conducive for production of the polypeptide; and (b) recovering the 
polypeptide. 

The present invention also relates to methods for in-situ production of a polypeptide of 
25 the present invention comprising (a) cultivating a transgenic plant or a plant cell comprising a 
nucleotide sequence encoding a polypeptide having cellobiohydrolase I activity of the present 
invention under conditions conducive for production of the polypeptide; and (b) contacting the 
polypeptide with a desired substrate, such as a cellulosic substrate, without prior recovery of 
the polypeptide. 

30 

Compositions 

In a still further aspect, the present invention relates to compositions comprising a 
polypeptide of the present invention. 

The composition may comprise a polypeptide of the invention as the major enzymatic 
35 component, e.g., a mono-component composition. Alternatively, the composition may 
comprise multiple enzymatic activities, such as an aminopeptidase, amylase, carbohydrase, 
carboxypeptidase, catalase, cellulase, chitinase, cutinase, cyclodextrin glycosyltransferase, 
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deoxyribonuclease, esterase, alpha-galactosidase, beta-galactosidase, glucoamylase, alpha- 
glucosidase, beta-glucosidase, haloperoxidase, invertase, laccase, lipase, mannosidase, 
oxidase, pectinolytic enzyme, peptidoglutaminase, peroxidase, phytase, polyphenoloxidase, 
proteolytic enzyme, ribonuclease, transglutaminase, or xylanase. 
5 The compositions may be prepared in accordance with methods known in the art and 

may be in the form of a liquid or a dry composition. For instance, the polypeptide composition 
may be in the form of a granulate or a microgranulate. The polypeptide to be included in the 
composition may be stabilized in accordance with methods known in the art. 

Examples are given below of preferred uses of the polypeptide compositions of the 
10 invention. The dosage of the polypeptide composition of the invention and other conditions 
under which the composition is used may be determined on the basis of methods known in the 
art. 

Detergent Compositions 

15 The polypeptide of the invention may be added to and thus become a component of a 

detergent composition. 

The detergent composition of the invention may for example be formulated as a hand or 
machine laundry detergent composition including a laundry additive composition suitable for 
pre-treatment of stained fabrics and a rinse added fabric softener composition, or be 

20 formulated as a detergent composition for use in general household hard surface cleaning 
operations, or be formulated for hand or machine dishwashing operations. 

In a specific aspect, the invention provides a detergent additive comprising the 
polypeptide of the invention. The detergent additive as well as the detergent composition may 
comprise one or more other enzymes such as a protease, a lipase, a cutinase, an amylase, a 

25 carbohydrase, a cellulase, a pectinase, a mannanase, an arabinase, a galactanase, a 
xylanase, an oxidase, e.g., a laccase, and/or a peroxidase. 

In general the properties of the chosen enzyme(s) should be compatible with the 
selected detergent, (i.e. pH-optimum, compatibility with other enzymatic and non-enzymatic 
ingredients, etc.), and the enzyme(s) should be present in effective amounts. 

30 Proteases : Suitable proteases include those of animal, vegetable or microbial origin. Microbial 
origin is preferred. Chemically modified or protein engineered mutants are included. The 
protease may be a serine protease or a metallo protease, preferably an alkaline microbial 
protease or a trypsin-like protease. Examples of alkaline proteases are subtilisins, especially 
those derived from Bacillus, e.g., subtilisin Novo, subtilisin Carlsberg, subtilisin 309, subtilisin 

35 147 and subtilisin 168 (described in WO 89/06279). Examples of trypsin-like proteases are 
trypsin (e.g. of porcine or bovine origin) and the Fusarium protease described in WO 89/06270 
and WO 94/25583. 
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Examples of useful proteases are the variants described in WO 92/19729, WO 
98/20115, WO 98/20116, and WO 98/34946, especially the variants with substitutions in one 
or more of the following positions: 27, 36, 57, 76, 87, 97, 101, 104, 120, 123, 167, 170, 194, 
206, 218, 222, 224, 235 and 274. 
5 Lipases : Suitable lipases include those of bacterial or fungal origin. Chemically modified or 
protein engineered mutants are included. Examples of useful lipases include lipases from 
Humicola (synonym Thermomyces), e.g. from H. lanuginosa (T. lanuginosus) as described in 
EP 258 068 and EP 305 216 or from H. insolens as described in WO 96/13580, a 
Pseudomonas lipase, e.g. from P. alcaligenes or P. pseudoalcaligenes (EP 218 272), P. 

10 cepacia (EP 331 376), P. stutzeri (GB 1,372,034), P. fluorescens, Pseudomonas sp. strain SD 
705 (WO 95/06720 and WO 96/27002), P. wisconsinensis (WO 96/12012), a Bacillus lipase, 
e.g. from B. subtilis (Dartois et al. (1993), Biochemica et Biophysica Acta, 1131, 253-360), fl. 
stearothermophilus (JP 64/744992) or fl. pumilus (WO 91/16422). 

Other examples are lipase variants such as those described in WO 92/05249, WO 

15 94/01541, EP 407 225, EP 260 105, WO 95/35381, WO 96/00292, WO 95/30744, WO 
94/25578, WO 95/14783, WO 95/22615, WO 97/04079 and WO 97/07202. 
Amylases: Suitable amylases (alpha and/or beta) include those of bacterial or fungal origin. 
Chemically modified or protein engineered mutants are included. Amylases include, for 
example, alpha-amylases obtained from Bacillus, e.g. a special strain of B. licheniformis, 

20 described in more detail in GB 1,296,839. 

Examples of useful amylases are the variants described in WO 94/02597, WO 
94/18314, WO 96/23873, and WO 97/43424, especially the variants with substitutions in one 
or more of the following positions: 15, 23, 105, 106, 124, 128, 133, 154, 156, 181, 188, 190, 
197, 202, 208, 209, 243, 264, 304, 305, 391, 408, and 444. 

25 Cellulases : Suitable cellulases include those of bacterial or fungal origin. Chemically modified 
or protein engineered mutants are included. Suitable cellulases include cellulases from the 
genera Bacillus, Pseudomonas, Humicola t Fusahum, Thielavia, Acremonium, e.g. the fungal 
cellulases produced from Humicola insolens, Myceliophthora thermophila and Fusanum 
oxysporum disclosed in US 4,435,307, US 5,648,263, US 5,691,178, US 5,776,757 and WO 

30 89/09259. 

Especially suitable cellulases are the alkaline or neutral cellulases having colour care 
benefits. Examples of such cellulases are cellulases described in EP 0 495 257, EP 0 531 
372, WO 96/11262, WO 96/29397, WO 98/08940. Other examples are cellulase variants such 
as those described in WO 94/07998, EP 0 531 315, US 5,457,046, US 5,686,593, US 
35 5,763,254, WO 95/24471, WO 98/12307 and PCT/DK98/00299. 

Peroxidases/Oxidases: Suitable peroxidases/oxidases include those of plant, bacterial or 
fungal origin. Chemically modified or protein engineered mutants are included. Examples of 
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useful peroxidases include peroxidases from Coprinus, e.g. from C. cinereus, and variants 
thereof as those described in WO 93/24618, WO 95/10602, and WO 98/15257. 

The detergent enzyme(s) may be included in a detergent composition by adding 
separate additives containing one or more enzymes, or by adding a combined additive 
5 comprising all of these enzymes. A detergent additive of the invention, i.e. a separate additive 
or a combined additive, can be formulated e.g. as a granulate, a liquid, a slurry, etc. Preferred 
detergent additive formulations are granulates, in particular non-dusting granulates, liquids, in 
particular stabilized liquids, or slurries. 

Non-dusting granulates may be produced, e.g., as disclosed in US 4,106,991 and 

10 4,661,452 and may optionally be coated by methods known in the art. Examples of waxy 
coating materials are poly(ethylene oxide) products (polyethyleneglycol, PEG) with mean 
molar weights of 1000 to 20000; ethoxylated nonylphenols having from 16 to 50 ethylene 
oxide units; ethoxylated fatty alcohols in which the alcohol contains from 12 to 20 carbon 
atoms and in which there are 15 to 80 ethylene oxide units; fatty alcohols; fatty acids; and 

15 mono- and di- and triglycerides of fatty acids. Examples of film-forming coating materials 
suitable for application by fluid bed techniques are given in GB 1483591. Liquid enzyme pre- 
parations may, for instance, be stabilized by adding a polyol such as propylene glycol, a sugar 
or sugar alcohol, lactic acid or boric acid according to established methods. Protected en- 
zymes may be prepared according to the method disclosed in EP 238,216. 

20 The detergent composition of the invention may be in any convenient form, e.g., a bar, a 

tablet, a powder, a granule, a paste or a liquid, A liquid detergent may be aqueous, typically 
containing up to 70 % water and 0-30 % organic solvent, or non-aqueous. 

The detergent composition comprises one or more surfactants, which may be non-ionic 
including semi-polar and/or anionic and/or cationic and/or zwitterionic. The surfactants are 

25 typically present at a level of from 0. 1 % to 60% by weight. 

When included therein the detergent will usually contain from about 1% to about 40% of 
an anionic surfactant such as linear alkylbenzenesulfonate, alpha-olefinsulfonate, alkyl sulfate 
(fatty alcohol sulfate), alcohol ethoxysulfate, secondary alkanesulfonate, alpha-sulfo fatty acid 
methyl ester, alkyl- or alkenylsuccinic acid or soap. 

30 When included therein the detergent will usually contain from about 0.2% to about 40% 

of a non-ionic surfactant such as alcohol ethoxylate, nonylphenol ethoxylate, 
alkylpolyglycoside, alkyldimethylamineoxide, ethoxylated fatty acid monoethanolamide, fatty 
acid monoethanolamide, polyhydroxy alkyl fatty acid amide, or N-acyl N-alkyl derivatives of 
glucosamine fglucamides"). 

35 The detergent may contain 0-65 % of a detergent builder or complexing agent such as 

zeolite, diphosphate, triphosphate, phosphonate, carbonate, citrate, nitrilotriacetic acid, 
ethylenediaminetetraacetic acid, diethylenetriaminepentaacetic acid, alkyl- or alkenylsuccinic 
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acid, soluble silicates or layered silicates (e.g. SKS-6 from Hoechst). 

The detergent may comprise one or more polymers. Examples are 
carboxymethylcellulose, poly(vinylpyrrolidone), poly (ethylene glycol), polyvinyl alcohol), 
poly(vinylpyridine-N-oxide), poly(vinylimidazole), polycarboxylates such as polyacrylates, 
5 maleic/acrylic acid copolymers and lauryl methacrylate/acrylic acid copolymers. 

The detergent may contain a bleaching system which may comprise a H 2 0 2 source such 
as perborate or percarbonate which may be combined with a peracid-forming bleach activator 
such as tetraacetylethylenediamine or nonanoyloxybenzenesulfonate. Alternatively, the 
bleaching system may comprise peroxyacids of e.g. the amide, imide, or sulfone type. 
10 The enzyme(s) of the detergent composition of the invention may be stabilized using 

conventional stabilizing agents, e.g., a polyol such as propylene glycol or glycerol, a sugar or 
sugar alcohol, lactic acid, boric acid, or a boric acid derivative, e.g., an aromatic borate ester, 
or a phenyl boronic acid derivative such as 4-formylphenyl boronic acid, and the composition 
may be formulated as described in e.g. WO 92/19709 and WO 92/19708. 
15 The detergent may also contain other conventional detergent ingredients such as e.g. 

fabric conditioners including clays, foam boosters, suds suppressors, anti-corrosion agents, 
soil-suspending agents, anti-soil redeposition agents, dyes, bactericides, optical brighteners, 
hydrotropes, tarnish inhibitors, or perfumes. 

It is at present contemplated that in the detergent compositions any enzyme, in particular 
20 the polypeptide of the invention, may be added in an amount corresponding to 0.01-100 mg of 
enzyme protein per liter of wash liquor, preferably 0.05-5 mg of enzyme protein per liter of 
wash liquor, in particular 0.1-1 mg of enzyme protein per liter of wash liquor. 

The polypeptide of the invention may additionally be incorporated in the detergent 
formulations disclosed in WO 97/07202 which is hereby incorporated as reference. 

25 

DNA recombination (shuffling) 

The nucleotide sequences of SEQ ID NO:1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, 
SEQ ID NO:9, SEQ ID NO:11, SEQ ID NO:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID 
NO: 19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ 

30 ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, 
SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID 
NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:62, SEQ 
ID NO;63, SEQ ID NO:64, SEQ ID NO:65, SEQ ID NO:67 may be used in a DMA 
recombination (or shuffling) process. The new polynucleotide sequences obtained in such a 

35 process may encode new polypeptides having cellobiase activity with improved properties, 
such as improved stability (storage stability, thermostability), improved specific activity, 
improved pH-optimum, and/or improved tolerance towards specific compounds. 
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Shuffling between two or more homologous input polynucleotides (starting-point 
polynucleotides) involves fragmenting the polynucleotides and recombining the fragments, to 
obtain output polynucleotides (i.e. polynucleotides that have been subjected to a shuffling 
cycle) wherein a number of nucleotide fragments are exchanged in comparison to the input 
5 polynucleotides. 

DNA recombination or shuffling may be a (partially) random process in which a library of 
chimeric genes is generated from two or more starting genes. A number of known formats can 
be used to carry out this shuffling or recombination process. 

The process may involve random fragmentation of parental DNA followed by reassembly 
10 by PCR to new full-length genes, e.g. as presented in US5605793, US5811238, US5830721, 
US61 17679. In-vitro recombination of genes may be carried out, e.g. as described in 
US61 59687, W098/41623, US61 59688, US5965408, US6 153510. The recombination process 
may take place in vivo in a living cell, e.g. as described in WO 97/07205 and WO 98/28416. 

The parental DNA may be fragmented by DNA'se I treatment or by restriction 
15 endonuclease digests as descriobed by Kikuchi et al (2000a, Gene 236:159-167). Shuffling of 
two parents may be done by shuffling single stranded parental DNA of the two parents as 
described in Kikuchi et al (2000b, Gene 243:133-137). 

A particular method of shuffling is to follow the methods described in Crameri et al, 
1998, Nature, 391: 288-291 and Ness et al. Nature Biotechnology 17: 893-896. Another format 
20 would be the methods described in US 6159687: Examples 1 and 2. 



Production off Ethanol from Biomass 

The present invention also relates to methods for producing ethanol from biomass, such 

as cellulosic materials, comprising contacting the biomass with the polypeptides of the 
25 invention. Ethanol may subsequently be recovered. The polypeptides of the invention may be 

produced "in-situ", i.e., as part of, or directly in an ethanol production process, by cultivating a 

host cell or a strain, which in its wild-type form is capable of producing the polypeptides, under 

conditions conducive for production of the polypeptides. 

Ethanol can be produced by enzymatic degradation of biomass and conversion of the 
30 released polysaccharides to ethanol. This kind of ethanol is often referred to as bioethanol or 

biofuel. It can be used as a fuel additive or extender in blends of from less than 1% and up to 

100% (a fuel substitute). In some countries, such as Brazil, ethanol is substituting gasoline to 

a very large extent. 

The predominant polysaccharide in the primary cell wall of biomass is cellulose, the 
35 second most abundant is hemi-cellulose, and the third is pectin. The secondary cell wall, 
produced after the cell has stopped growing, also contains polysaccharides and is 
strengthened through polymeric lignin covalently cross-linked to hemicellulose. Cellulose is a 
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homopolymer of anhydrocellobiose and thus a linear beta-(1-4)-D-glucan, while hemicelluloses 
include a variety of compounds, such as xylans, xyloglucans, arabinoxylans, and mannans in 
complex branched structures with a spectrum of substituents. Although generally 
polymorphous, cellulose is found in plant tissue primarily as an insoluble crystalline matrix of 
5 parallel glucan chains. Hemicelluloses usually hydrogen bond to cellulose, as well as to other 
hemicelluloses, which helps stabilize the cell wall matrix. 

Three major classes of cellulase enzymes are used to breakdown biomass: 
o The O endo-1,4-beta-glucanases" or 1,4-beta-D-glucan-4-glucanohydrolases (EC 3.2.1.4), 
which act randomly on soluble and insoluble 1 ,4-beta-glucan substrates. 
10 o The H exo-1 f 4-beta-D-glucanases n including both the 1 ,4-beta-D-glucan glucohydrolases 
(EC 3.2.1.74), which liberate D-glucose from 1 ,4-beta-D-glucans and hydrolyze D- 
cellobiose slowly, and 1 ,4-beta-D-glucan cellobiohydrolase (EC 3.2.1.91), also referred to 
as cellobiohydrolase I, which liberates D-cellobiose from 1,4-beta-glucans. 
o The "beta-D-glucosidases" or beta-D-glucoside glucohydrolases (EC 3.2.1.21), which act 
15 to release D-glucose units from cellobiose and soluble cellodextrins, as well as an array of 
glycosides. 

These three classes of enzymes work together synergistically in a complex interplay that 
results in efficient decrystallization and hydrolysis of native cellulose from biomass to yield the 
20 reducing sugars which are converted to ethanol by fermentation. 

The present invention is further described by the following examples which should not be 
construed as limiting the scope of the invention. 

25 EXAMPLES 

Chemicals used as buffers and substrates were commercial products of at least reagent 
grade. 

EXAMPLE 1 

30 Cloning of a partial and a full-length cellobiohydrolase I (CBH1) DMA sequence 

A cDIMA library of Diplodia gossypina was PCR screened for presence of the CBH1 gene. For 
this purpose sets of primers were constructed, based on sequence alignment and 
identification of conserved regions among CBH1 proteins. The PCR band from a gel 
35 electrophoresis was used to obtain a partial sequence of the CBH1 gene from Diplodia 
gossypina. Homology search confirmed that the partial sequence was a partial sequence of 
the CBH1 gene (EC 3.2.1.91). 
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The full-length CBH1 gene of Diplodia gossypina is obtained by accessing the patent deposit 
CBS 247.96, make a DNA or cDNA preparation, use the partial sequence as basis for 
construction of specific primers, and use standard PGR cloning techniques to step by step 
5 getting the entire gene. 

Several other approaches can be taken: 

o PCR screening of the cDNA library or the cDNAs that were used for the construction of 
10 the library, could be performed. To do so, Gene Specific Primers (GSP) and 
vector/adaptor primers are constructed from the partial cDNA sequence of the CBH1 gene 
and from vector/adaptor sequence respectively; both sets of primers designed to go 
outward into the missing 5' and 3' regions of the CBH1 cDNA. The longest PCR products 
obtained using combinations of GSP and vector/adaptor primer represent the full-length 5' 
15 and 3' end regions of the CBH1 cDNA from Diplodia gossypina. Homology search and 

comparison with the partial cDNA sequence confirm that the 5* and 3' PCR products 
belong to the same CBH1 cDNA from Diplodia gossypina. The full-length cDNA can then 
be obtained by PCR using a set of primers constructed from both the 5'and 3' ends. 

20 o Alternatively, the cDNA library could be screened for the full-length cDNA using standard 
hybridization techniques and the partial cDNA sequence as a probe. The clones giving a 
positive hybridization signal with the probe are then purified and sequenced to determine 
the longest cDNA sequence. Homology search and comparison confirms that the full- 
length cDNA correspond to the partial CBH1 cDNA sequence that was originally used as a 

25 probe. 

The two approaches described above rely on the presence of the full-length CBH1 cDNA in 
the cDNA library or in the cDNAs used for its construction. Alternatively, the 5' and 3' RACE 
(Rapid Amplification of cDNA Ends) techniques or derived techniques could be used to identify 
30 the missing 5' and 3' regions. For this purpose, preferably mRNAs from Diplodia gossipina are 
isolated and utilized to synthesize first strand cDNAs using oligo(dT)- containing Adapter 
Primer or a 5'- Gene Specific Primer (GSP). 

The full-length cDNA of the CBH1 gene from Diplodia gossypina can also be obtained by 
35 using genomic DNA from Diplodia gossypina. The CBH1 gene can be identified by PCR 
techniques such as the one describe above or by standard genomic library screening using 
hybridization techniques and the partial CBH1 cDNA as a probe. Homology search and 
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comparison with the partial CBH1 cDNA confirms that the genomic sequence correspond to 
the CBH1 gene from Diplodia gossypina. Identification of consensus sequences such as 
initiation site of transcription, start and stop codons or polyA sites could be used to defined the 
region comprising the fulMength cDNA. Primers constructed from both the 5' and 3' ends of 
5 this region could then be used to amplify the full-length cDNA from mRNA or cDNA library 
from Diplodia gossypina (see above). 

By expression of the full-length gene in a suitable expression host construct the CBH1 enzyme 
is harvested as an intra cellular or extra cellular enzyme from the culture broth. 

10 

The methods described above apply to the cloning of cellobiohydrolase I DNA sequences from 
all organisms and not only Diplodia gossypina. 



15 EXAMPLE 2 

Cellobiohydrolase I (CBH I) Activity 

A cellobiohydrolase I is characterized by the ability to hydrolyze highly crystalline 
cellulose very efficiently compared to other cellulases. Cellobiohydrolase I may have a higher 
20 catalytic activity using PASC (phosphoric acid swollen cellulose) as substrate than using CMC 
as substrate. For the purposes of the present invention, any of the following assays can be 
used to identify a cellobiohydrolase I: 

Activity on Azo-Avicel 

25 Azo-Avicel (Megazyme, Bray Business Park, Bray, Wicklow, Ireland) was used according to 
the manufacturers instructions. 

Activity on PNP-beta-cellobiose 

Substrate solution: 5 mM PNP beta-D-Cellobiose (p-Nitrophenyl p-d-Cellobioside Sigma N- 
30 5759) in 0.1 M Na-acetate buffer, pH 5.0; 

Stop reagent; 0.1 M Na-carbonate, pH 1 1 .5. 

50 \iL CBH I solution was mixed with 1 mL substrate solution and incubated 20 minutes 
at 40°C. The reaction was stopped by addition of 5 mL stop reagent. Absorbance was 
measured at 404 nm. 

35 

Activity on PASC and CMC 

The substrate is degraded with cellobiohydrolase I (CBH I) to form reducing sugars. A 

61 



WO 03/000941 PCT/DK02/00429 

Microdochium nivale carbohydrate oxidase (rMnO) or another equivalent oxidase acts on the 
reducing sugars to form H 2 0 2 in the presence of 0 2 . The formed H 2 0 2 activates in the 
presence of excess peroxidase the oxidative condensation of 4-aminoantipyrine (AA) and N- 
ethyl-N-sulfopropyl-m-toluidine (TOPS) to form a purple product which can be quantified by its 
5 absorbance at 550 nm. 

When all components except CBH I are in surplus, the rate of increase in absorbance is 
proportional to the CBH I activity. The reaction is a one-kinetic-step reaction and may be 
carried out automatically in a Cobas Fara centrifugal analyzer (Hoffmann La Roche) or 
another equivalent spectrophotometer which can measure steady state kinetics. 

10 

Buffer: 50 mM Na-acetate buffer (pH 5.0); 

Reagents: rMnO oxidase, purified Microdochium nivale carbohydrate oxidase, 2 mg/L (final 
concentration); 

Peroxidase, SIGMA P-8125 (96 U/mg), 25 mg/L (final concentration); 
15 4-aminoantipyrine, SIGMA A-4382, 200 mg/L (final concentration); 

TOPS, SIGMA E-8506, 600 mg/L (final concentration); 
PASC or CMC (see below), 5 g/L (final concentration). 
All reagents were added to the buffer in the concentrations indicated above and this 
reagent solution was mixed thoroughly. 
20 50 |jL cellobiohydrolase I sample (in a suitable dilution) was mixed with 300 pL reagent 

solution and incubated 20 minutes at 40°C. Purple color formation was detected and 
measured as absorbance at 550 nm. 

The AA/TOPS-condensate absorption coefficient is 0.01935 A 55 o/([jM cm). The rate is 
calculated as pmoles reducing sugar produced per minute from OD 5 5o/minute and the 
25 absorption coefficient. 

PASC: 

Materials: 5 g Avicel® (Art. 2331 Merck); 

150 mL 85% Ortho-phosphoric-acid (Art. 573 Merck); 
30 800 mL Acetone (Art. 14 Merck); 

Approx. 2 liter deionized water (Milli-Q); 
1 L glass beaker; 

1 L glass filter funnel; 

2 L suction flask; 

35 Ultra Turrax Homogenizes 

Acetone and ortho-phosphoric-acid is cooled on ice. Avicel® is moisted with water, and 
then the 150 mL icecold 85% Ortho-phosphoric-acid is added. The mixture is placed on an 
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icebath with weak stirring for one hour. 

Add 500 mL ice-cold acetone with stirring, and transfer the mixture to a glass filter funnel 
and wash with 3 x 100 mL ice-cold acetone, suck as dry as possible in each wash. Wash with 
2 x 500 mL water (or until there is no odor of acetone), suck as dry as possible in each wash. 
5 Re-suspend the solids in water to a total volume of 500 mL, and blend to homogeneity 

using an Ultra Turrax Homogenizer. Store wet in refrigerator and equilibrate with buffer by 
centrifugation and re-suspension before use. 

CMC: 

10 Bacterial cellulose microfibrils in an impure form was obtained from the Japanese 

foodstuff "nata de coco" (Fujico Company, Japan). The cellulose in 350 g of this product was 
purified by suspension of the product in about 4 L of tap water. This water was replaced by 
fresh water twice a day for 4 days. 

Then 1% (w/v) NaOH was used instead of water and the product was re-suspended in 

15 the alkali solution twice a day for 4 days. Neutralisation was done by rinsing the purified 
cellulose with distilled water until the pH at the surface of the product was neutral (pH 7). 

The cellulose was microfibrillated and a suspension of individual bacterial cellulose 
microfibrils was obtained by homogenisation of the purified cellulose microfibrils in a Waring 
blender for 30 min. The cellulose microfibrils were further purified by dialysing this suspension 

20 through a pore membrane against distilled water and the isolated and purified cellulose 
microfibrils were stored in a water suspension at 4°C. 



25 
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Deposit of Biological Material 

China General Microbiological Culture Collection Center (CGMCC) 

The following biological material has been deposited under the terms of the Budapest 
Treaty with the China General Microbiological Culture Collection Center (CGMCC), Institute of 
Microbiology, Chinese Academy of Sciences, Haidian, Beijing 100080, China: 



Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



CGMCC No. 0584 

ND000575 

2001-05-29 

Acremonium thermophilum CBH I gene on plasmid 
Ascomycota\ Sordariomycete$\ Hypocrales] Hypocreaceae 
China, 1999 

SEQ ID NO:1 and SEQ ID NO:2 (DNA sequence encoding a 
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cellobiohydrolase I from Acremonium thermophilum and the 
corresponding protein sequence) 



10 



Accession Number 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



CGRflCC No. 0581 

ND000548 

2001-05-29 

Chaetomium thermophilum CBH I gene on plasmid 
Ascomycota: Sordariomycetes\ Sordariales\ Chaetomiaceae 
China, 1999 

SEQ ID NO:3 and SEQ ID NO:4 (DNA sequence encoding a 
cellobiohydrolase I from Chaetomium thermophilum and the 
corresponding protein sequence) 



Accession Number 
1 5 Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

20 Related sequence(s): 



CGRflCC No. 0585 

ND001223 

2001-05-29 

Scytalidium sp. CBH I gene on plasmid 
Ascomycota; Mitosporic 
China, 1999 

SEQ ID NO:5 and SEQ ID NO:6 (DNA sequence encoding a 
cellobiohydrolase I from Scytalidium sp. and the corresponding 
protein sequence) 



Accession Number 
25 Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

30 Related sequence(s): 



CGRflCC No. 0582 

ND000549 

2001-05-29 

Thermoascus aurantiacus CBH I gene on plasmid 
Eurotiomycetes\ Eurotiales', Trichocomaceae 
China 

SEQ ID NO:7 and SEQ ID NO:8 (DNA sequence encoding a 
cellobiohydrolase I from Thermoascus aurantiacus and the 
corresponding protein sequence) 



Accession Number: 
35 Applicants reference: 
Date of Deposit: 
Description: 



CGRflCC No. 0583 

ND001182 

2001-05-29 

Thielavia australiensis CBH I gene on plasmid 
64 
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Classification: 
Origin: 

Related sequence(s): 



Accession Number: 
Applicants reference: 
Date of Deposit: 
10 Description: 
Classification: 
Origin: 

Related sequence(s): 

15 

Accession Number: 
Applicants reference: 
Date of Deposit: 
20 Description: 
Classification: 
Origin: 

Related sequence(s): 

25 Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 

30 Origin: 

Related sequence(s): 

Accession Number: 
Applicants reference: 
35 Date of Deposit: 
Description: 
Classification: 



PCT7DK02/00429 

Ascomycota\ Sordariomycetes] Sordariales] Chaetomiaceae 
China, 1998 

SEQ ID NO:9 and SEQ ID NO: 10 (DNA sequence encoding a 
cellobtohydrolase I from Thielavia australiensis and the 
corresponding protein sequence) 

CGMCC No. 0580 

ND000562 

2001- 05-29 

Melanocarpus albomyces CBH I gene on plasmid 
Ascomycota] Sordariomycetes] Sordariales 
China, 1999 

SEQ ID NO:15 and SEQ ID NO:16 (DNA sequence encoding a 
cellobtohydrolase I from Melanocarpus albomyces and the 
corresponding protein sequence) 

CGMCC No. 0748 
ND001181 

2002- 06-07 

Acremonium sp. CBH I gene on plasmid 
mitosporic Ascomycetes 
China, 2000 

SEQ ID NO:53 and SEQ ID NO:54 

CGMCC No. 0749 

ND000577 

2002-06-07 

Chaetomidium pingtungium CBH I gene on plasmid 
Chaetomiaceae, Sordariales, Ascomycota 
China, 2000 

SEQ ID NO:55 and SEQ ID NO:56 

CGMCC No. 0747 

ND001175 

2002-06-07 

Sporotrichum pruinosum CBH I gene on plasmid 
Meruliaceae, Stereales, Basidiomycota 
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Origin: 

Related sequence(s): 



PCT/DK02/00429 



China, 2000 

SEQ ID NO:57 and SEQ ID NO:58 



10 



Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



CGftflCC No. 0750 

ND000571 

2002-06-07 

Scytalidium thermophilum CBH I gene on plasmid 
Ascomycota; Mitosporic 
China, 2000 

SEQ ID NO:59 and SEQ ID NO:60 



15 



Centraalbureau Voor Schimmelcultures (CBS) 

The following biological material has been deposited under the terms of the Budapest 
Treaty with the Centraalbureau Voor Schimmelcultures (CBS), Uppsalalaan 8, 3584 CT 
Utrecht, The Netherlands (alternatively P.O.Box 85167, 3508 AD Utrecht, The Netherlands): 



20 
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Accession Number 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



CBS 109513 
ND000538 
2001-06-01 
Verticillium tenerum 

Ascomycota, Hypocreales, Pyrenomycetes (mitosporic) 

SEQ ID NO:1 1 and SEQ ID NO: 12 (DNA sequence encoding a 
cellobiohydrolase I from Verticillium tenerum and the corresponding 
protein sequence) 



30 



Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



35 



CBS 819.73 
ND000533 

Publicly available (not deposited by applicant) 
Humicola nigrescens 

Sordariaceae, Sordariales, Sordariomycetes] Ascomycota 

SEQ ID NO: 18 (partial DNA sequence encoding a cellobiohydrolase 
I from Humicola nigrescens) 



Accession Number: CBS 427.97 
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Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
5 Origin: 

Related sequence(s): 

Accession Number: 
10 Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

1 5 Related sequence(s): 



20 

Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
25 Classification: 
Origin: 

Related sequence(s): 

30 Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 

35 Origin: 

Related sequence(s): 



PCT/DK02/00429 

ND000530 
1997-01-23 

Cladorrhinum foecundissimum 

Sordariaceae, Sordariales, Sordariomycetes: Ascomycota 
Jamaica 

SEQ ID NO: 19 (partial DNA sequence encoding a cellobiohydrolase 
I from Cladorrhinum foecundissimum) 

CBS 247.96 

ND000534 and ND001231 

1996-03-12 

Diplodia gossypina 

Dothideaceae, Dothideales, Dothidemycetes, Ascomycota 
Indonesia, 1992 

SEQ ID NO:20 (partial DNA sequence encoding a cellobiohydrolase 
I from Diplodia gossypina), SEQ ID NO:37 (full DNA sequence 
encoding a cellobiohydrolase I from Diplodia gossypina) and SEQ 
ID NO:38 (full cellobiohydrolase I protein sequence from Diplodia 
gossypina) 

CBS 117.65 

ND000536 
Publicly available 
Myceliophthora thermophila 

Sordariaceae, Sordariales, Sordariomycetes: Ascomycota 

SEQ ID NO:21 (partial DNA sequence encoding a cellobiohydrolase 
I from Myceliophthora thermophila) 

CBS 109471 

ND000537 

2001-05-29 

Rhizomucor pusillus 

Mucoraceae, Mucorales, Zygomycota 

Denmark 

SEQ ID NO:22 (partial DNA sequence encoding a cellobiohydrolase 
I from Rhizomucor pusillus) 
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Accession Number 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



CBS 521.95 
ND000542 
1995-07-04 
Meripilus giganteus 

Rigidiporaceae, Hymenomycetales, Basidiomycota 
Denmark, 1993 

SEQ ID NO: 23 (partial DNA sequence encoding a cellobiohydrolase 
I from Meripilus giganteus) 



Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



CBS 277.96 

ND000543, ND001346 and ND001243 

1996-03-12 

Exidia glandulosa 

Exidiaceae, Auriculariales, Hymenomycetes, Basidiomycota 
Denmark, 1993 

SEQ ID NO:24 (partial DNA sequence encoding a cellobiohydrolase 
I from Exidia glandulosa), SEQ ID NO:45 (full DNA sequence 
encoding a cellobiohydrolase I with CBD from Exidia glandulosa), 
SEQ ID NO:46 (full cellobiohydrolase I protein sequence with CBD 
from Exidia glandulosa), SEQ ID NO:47 (full DNA sequence 
encoding a cellobiohydrolase I from Exidia glandulosa) and SEQ ID 
NO:48 (full cellobiohydrolase I protein sequence from Exidia 
glandulosa) 



Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



CBS 284.96 

ND000544 and ND001235 

1996-03-12 

Xylaria hypoxylon 

Sordariaceae, Sordariales, Sordariomycetes, Ascomycota 
Denmark, 1993 

SEQ ID NO:25 (partial DNA sequence encoding a cellobiohydrolase 
I from Xylaria hypoxylon), SEQ ID NO:43 (full DNA sequence 
encoding a cellobiohydrolase I from Xylaria hypoxylon) and SEQ ID 
NO:44 (full cellobiohydrolase I protein sequence from Xylaria 
hypoxylon) 
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Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Related sequence(s): 



PCT/DK02/00429 

CBS 804.70 
ND001227 
Publicly available 
Trichophaea saccata 

Ascomycota] Pezizomycetes\ Pezizales: Pyronemataceae 

SEQ ID NO:36 (partial DNA sequence encoding a cellobiohydrolase 

I from Trichophaea saccata) 



10 Deutsche Sammluno von Mikroorqanismen und Zellkulturen GmbH (DSMZ) 

The following biological material has been deposited under the terms of the Budapest 
Treaty with the Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (DSMZ), 
Mascheroder Weg 1b, 38124 Braunschweig, Germany: 



1 5 Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 

20 Origin: 

Related sequence(s): 



DSRfl 14348 

ND000551 

2001-06-13 

Neotermes castaneus, termite CBH I gene on plasmid 

Cultures of termite larvae bought from BAM, Germany, 1999 
SEQ ID NO:13 and SEQ ID NO:14 (DNA sequence encoding a 
cellobiohydrolase I from gut cells or microbes from the gut of 
Neotermes castaneus and the corresponding protein sequence) 



25 Accession Number: 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 

30 Origin: 

Related sequence(s): 



DSRfl 15066 

ND001349 

2002-06-21 

Poitrasia circinans CBH I gene on plasmid 
Choanephoraceae, Zygomycota, Mucorales 

SEQ ID NO:49 (DNA sequence encoding a cellobiohydrolase I from 
Poitrasia circinans) and SEQ ID NO:50 (cellobiohydrolase I protein 
sequence from Poitrasia circinans) 



35 Accession Number: DSRfl 15065 
Applicants reference: ND001 339 
Date of Deposit: 2002-06-21 
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Description: 

Classification: 

Origin: 

Related sequence(s): 



Coprinus cinereus CBH I gene on plasmid 
Basidiomycota, Hymenomycetes, Agaricales, Agaricaceae 
Denmark 

SEQ ID NO:51 (DNA sequence encoding a cellobiohydrolase I from 
Coprinus cinereus) and SEQ ID NO:52 (cellobiohydrolase I protein 
sequence from Coprinus cinereus) 



10 



15 



Accession Number. 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



DS ft/115064 

ND001264 

2002-06-21 

Trichophaea saccata CBH I gene on plasmid 
Ascomycota] Pezizomycetes\ Pezizales\ Pyronemataceae 

SEQ ID NO:39 (DNA sequence encoding a cellobiohydrolase I from 
Trichophaea saccata) and SEQ ID NO:40 (cellobiohydrolase I 
protein sequence from Trichophaea saccata) 



20 



25 



Accession Number- 
Applicants reference: 
Date of Deposit: 
Description: 
Classification: 
Origin: 

Related sequence(s): 



DSRfl 15067 

ND001232 

2002-06-21 

Myceliophthora thermophila CBH I gene on plasmid 
Sordariaceae, Sordariales, Sordariomycetes] Ascomycota 

SEQ ID NO:41 (DNA sequence encoding a cellobiohydrolase I from 
Myceliophthora thermophila) and SEQ ID NO:42 (cellobiohydrolase I 
protein sequence from Myceliophthora thermophila) 



Institute for Fermentation. Osaka (\FO) 
30 The following biological material has been deposited under the terms of the Budapest 

Treaty with the Institute for Fermentation, Osaka (IFO), 17-85, Juso-honmachi 2-chome, 
Yodogawa-ku, Osaka 532-8686, Japan: 

Accession Number: IFO 5372 

35 Applicants reference: ND000531 

Date of Deposit: Publicly available (not deposited by applicant) 

Description: Trichothecium roseum 
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Classification: mitosporic Ascomycetes 

Origin: 

Related sequence(s): SEQ ID NO: 17 (partial DNA sequence encoding a cellobiohydrolase 

I from Thchothecium roseum) 

5 

The deposit of CBS 427.97, CBS 247.96, CBS 521 .95, CBS 284.96, CBS 274.96 were made 
by Novo Nordisk A/S and were later assigned to Novozymes A/S. 
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Microorganism(s) or Other Biological 
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0-2 


International Application No. 




0.3 


Applicant's or agent's file reference 
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1 

1-1 
1-2 


The Indications made below relate to 
the deposited microorganism (s) or 
other biological material referred to 
In the description on: 
page 

line 


63-64 
31-2 


L3 
1-3-1 

1-3-2 

1-3-3 
1-3-4 


Identification of Deposit 

Nam** of riftnncilan/ Institution 
maun; ui UcpUollai j inouUJUOn 

Address of depositary institution 

Dale of deposit 
Accession Number 


China General Microbiological Culture 
Collection Center 

China Committee for Culture Collection 
of Microorganisms, P.O. Box 2714, 
Beijing 100080, China 
29 May 2001 (29.05.2001) 
CGMCC 0584 


1-4 


Additional Indications 


NONE 
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n«»cin nataH Qtat»e 4nr WKIrh 
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Indications are Made 


all designated States 
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Separate Furnishing of Indications 
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the International Bureau later 


NONE 
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2-1 

2-2 


The indications made below relate to 
the deposited mlcroorganlsm(s) or 
other biological material referred to 
in the description on: 
page 

line 


64 
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2-3 
2-3-1 

2-3-2 

2-3-3 
2-3-4 


Identification of Deposit 
Name of depositary institution 

Address of depositary institution 

Date of deposit 
Accession Number 


China General Microbiological Culture 
Collection Center 

China Committee for Culture Collection 
of Microorganisms, P.O. Box 2714, 
Beijing 100080, China 
29 May 2001 (29.05.2001) 
CGMCC 0581 


2-4 


Additional Indications 


NONE 


2-5 


Designated States for Which 
Indications are Made 


all designated States 


2-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the Internationa! Bureau later 


NONE 
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The indications made below relate to 
the deposited mlcroorganfsm(s) or 
other biological material referred to 
in the description on: 
page 

line 
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14-22 



Identification of Deposit 
Name of depositary institution 

Address of depositary institution 



Date of deposit 
Accession Number 



China General Microbiological Culture 
Collection Center 

China Committee for Culture Collection 
of Microorganisms, P.O. Box 2714, 
Beijing 100080, China 
29 May 2001 (29.05.2001) 
CGMCC 0585 



Additional Indications 



NONE 



Designated States for Which 
Indications are Made 



all designated States 



Separate Furnishing of Indications 

These indications will be submitted to 
the international Bureau later 



NONE 



The indications made below relate to 
the deposited microorgantsm(s) or 
other biological material referred to 
In the description on: 
page 

line 
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24-32 



Identification of Deposit 
Name of depositary institution 

Address of depositary institution 



Date of deposit 
Accession Number 



China General Microbiological Culture 
Collection Center 

China Committee for Culture Collection 
of Microorganisms, P.O. Box 2714, 
Beijing 100080, China 
29 May 2001 (29.05.2001) 
CGMCC 0582 



Additional Indications 



NONE 



Designated States for Which 
Indications are Made 



all designated States 



Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 



NONE 



The indications made below relate to 
the deposited microorganism (s) or 
other biological material referred to 
in the description on: 
page 

line 



64-65 
34-5 



WO 03/000941 



PCT/DK02/00429 



5-3 


Identification of Deposit 




5-3-1 


Name of depositary institution 


r^Vi -J na f2^TitffcY-a1 Mi f*T*rtV> , i OCT'S pal Oi 1 1 fnrp 

Collection Center 


5-3-2 


Address of depositary institution 


China Cojnmittee for Culture Collection 
of Microorganisms, P.O. Box 2714 , 
Beijing 100080 , China 


5-3-3 


Date of deposit 


ft — . « - ^A/\1 /Oft AC ^ ft ft "1 \ 

2 9 Nay 2001 (29.05.2001) 


5-3-4 


Accession Number 


CGMCC 0583 


5^ 


Additional Indications 


NONE 


5-5 


Designated States for Which 
Indications are Made 


all designated States 


5-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


6 


The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 




6-1 


page 


65 


6-2. 


iine 


7.1 c 


6-3 


Identification of Deposit 




6-3-1 


Name of depositary institution 


Phi Tia r2 oTi era 1 M^t o y""/"\V> t 1 rsrv i 1 f% n T ^ .% 
v*jlxxixcl uuiici ax niLiuuiuiuyiuaX Luiuurc 

Collection Center 


6-3-2 


Address of depositary institution 


China Committee for Culture Collection 
of Microorganisms , P.O. Box 2714, 
Beijing 100080, China 


6-3-3 


Date of deposit 


•? o ur<->«v o ft m /no ac oaat \ 

z 7 May 2uoi (Z9.Qb.2Q0ij 


■6-3-4 


Accession Number 


CGMCC 0580 


6-4 


Additional Indications 


NONE 


6-5 


Designated States for Which 
Indications are Made 


all designated States 


6-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


7 


The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 




7-1 


page 


65 


7-2 


iine 




7-3 


Identification of Denosit 




7-3-1 


Name of depositary institution 


^eutre general uuinois cie Luitures 
Mi crobiologicjues 


7-3-2 


Address of depositary institution 


Chine - Comite pour la collection de 
cultures de micro-organismes, P.O. Box 
2714, Beijing 100080 


7-3-3 


Date of deposit 


07 June 2002 (07.06.2002) 


7-3-4 


Accession Number 


CGCCM 0748 


74 


Additional indications 


NONE 



74 
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Designated States for Which 
Indications are Made 



all designated States 



Separate Furnishing of indications 

These indications will be submitted to 
the International Bureau later 



NONE 



The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 
page 

line 



65 

25-31 



Identification of Deposit 
Name of depositary institution 

Address of depositary institution 



Date of deposit 
Accession Number 



Centre General Chinois de Cultures 
Microbiologiques 

Chine - Comite pour la collection de 

cultures de micro-organismes, P.O. Box 

2714, Beijing 100080 

07 June 2002 (07.06.2002) 

CGCCM 0749 



Additional Indications 



NONE 



Designated States for Which 
Indications are Made 



all designated States 



Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 



NONE 



The Indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 
page 

line 



65-66 
33-2 



Identification of Deposit 
Name of depositary institution 

Address of depositary institution 



Date of deposit 
Accession Number 



Centre General Chinois de Cultures 
Microbiologiques 

Chine - Comite pour la collection de 

cultures de micro -organismes, P.O. Box 

2714, Beijing 100080 

07 June 2002 (07.06.2002) 

CGCCM 0747 



Additional Indications 



NONE 



Designated States for Which 
Indications are Made 



all designated States 



Separate Furnishing of Indications 

These indications wiD be submitted to 
the International Bureau later 



NONE 



The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
In the description on: 
page 

Hne 



66 

4-10 



75 





WO 03/000941 


PCT/DK02/00429 


10-3 
10-3-1 

10-3-2 

10-3-3 
10-3-4 


Identification of Deposit 
Name of depositary institution 

Address of depositary institution 

Oate of deposit 
Accession Number 


Centre General Chinois de Cultures 
Microbiologiques 

Chine - Comite pour la collection de 
cultures ae xmcro-organismes, P.O. Box 
2714, Beijing 100080 . 
07 June 2002 (07.06.2002) 
CGCCM 0750 




Additional Indications 


NONE 


10-5 


Designated States for Which 
Indications are Made 


all designated States 


10-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


11 

11-1 
11-2 


The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 
page 

line 


66 

18-26 


11-3 

11-3-1 
11-3-2 

11-3-3 
11-3-4 


Identification of Deposit 
Name of depositary institution 
Address of depositary institution 

Date of deposit 
Accession Number 


Centraalbureau voor Schimmelcultures 
Uppsalalaan 8, NL-3584 CT Utrecht, The 
wetnerianas / P.O. Box 8 5 16 7, NL- 3508 AD 
Utrecht, The Netherlands 
01 June 2001 (01.06.2001) 
CBS 109513 


AAA 


Additional Indications 


NONE 


11-5 


Designated States for Which 
indications are Made 


all designated States 


11-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


4 O 

12 

12-1 
12-2 


The indications made below relate to 
the deposited microorganism(8) or 
other biological material referred to 
in the description on: 
page 

line 


66-67 
37-7 


12-3 

12-3-1 

12-3-2 

12-3-3 
12-3-4 


Identification of Deposit 
Name of depositary institution 
Address of depositary institution 

Date of deposit 
Accession Number 


Centraalbureau voor Schimmelcultures 

Uppsalalaan 8 # NL-3584 CT Utrecht, The 

Netherlands / P.O. Box 85167/ NL-3508 AD 

Utrecht, The Netherlands 

23 January 1997 (23.01.1997) 

CBS 427.97 


12-4 


Additional Indications 


NONE 


12-5 


Designated States for Which 
Indications are Made 


all designated States 



76 





WO 03/000941 


PCT/DK02/00429 


12-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


13 

13-1 
13-2 


i in? IHUlWdllUild IlldUt; UclOW 161316 IO 

the deposited microorganism(s) or 
other biological material referred to 
in the description on: 
page 

line 


67 

9-19 


13-3 

13-3-1 

13-3-2 

13-3-3 
13-3-4 


Identification of Deposit 
Name of depositary institution 
Address of depositary institution 

Date of deposit 
Accession Number 


Centraalbureau voor Schimmelcultures 
Uppsalalaan 8, NL-3584 CT Utrecht, The 
^eunerxancis / F.u. dox 85167, NL- 3508 AD 
Utrecht, The Netherlands 
12 March 1996 (12.03 ,1996) 
CBS 247.96 


13-4 


naaiuunai inuiuauuJis 


NONE 


13-5 


Designated States for Which 
Indications are Made 


all designated States 


13-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


14 

14-1 
14-2 


inc muiMSiiuii^ uiuut; uciow route io 

the deposited microorganism(s) or 
other biological material referred to 
in the description on: 
page 

line 


67 

30-37 


14-3-1 
14-3-2 

14-3-3 
14-3-4 


Identification of Deposit 
Name of depositary institution 
Address of depositary institution 

Date of deposit 
Accession Number 


Centraalbureau voor Schimmelcultures 
uppsalalaan 8, NL-3584 CT Utrecht, The 
wetnerianas / P.O. Box 85167, NL-3508 AD 
Utrecht, The Netherlands 
29 May 2001 (29.05.2001) 
CBS 109471 


14-4 


Additional Inrfiratinnc 


NONE 


14-5 


Designated States for Which 
Indications are Made 


all designated States 




Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


15 

15-1 
15-2 


The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 
page 

ine 


68 
2-9 



77 



WO 03/000941 



PCT7DK02/00429 



15-3 

15-3-1 

15-3-2 

15-3-3 
15-3-4 


Identification of Deposit 
Name of depositary institution 
Address of depositary institution 

Date of deposit 
Accession Number 


Centraalbureau voor Schimmelcultures 
UDDsalalaan 8, NL-3584 OT TTtrprhf ThA 
Netherlands / P.O. Box 85167, NL-3508 AD 
Utrecht, The Netherlands 
04 July 1995 (04.07.1995) 
CBS 521.95 


15-4 


Additional Indications 


NONE 


15-5 


Designated States for Which 
Indications are Made 


all designated States 


1 


ovpartfic rumisruriQ ot inui&aiions 

These indications will be submitted to 
the Internationa! Bureau later 


NONE 


16 

16-1 
16-2 


The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 
page 

line 


68 

26-36 


16-3 

16-3-1 

16-3-2 

16-3-3 
16-3-4 


Identification of Deposit 
Name of depositary institution 
Address of depositary institution 

Date of deposit 
Accession Number 


Centraalbureau voor Schimmelcultures 

xa.o.ij. o f i\±j — 3 o t \— jl uurccut/ me 
Netherlands / P.O. Box 85167, NL-3508 AD 
Utrecht , The Netherlands 
12 March 1996 (12.03.1996) 
CBS 284.96 


16-4 


Additional Indications 


NONE 


16-5 


Designated States for Which 
Indications are Made 


all designated States 


16-6 


Sflnarato Ftirnichfnn nf lnrilf*atinne 
^oyoiaie rui iiioiiiiiy v\ inuivalions 

These indications will be submitted to 
the International Bureau later 


NONE 


17 

17-1 
17-2 


The indications made below relate to 
the deposited mlcroorganism(s) or 
other biological material referred to 
in the description on: 
page 

Kne 


68 

11-24 


17-3 

17-3-1 

17-3-2 

17-3-3 
17-3-4 


Identification of Deposit 
Name of depositary Institution 
Address of depositary institution - 

Date of deposit 
Accession Number 


Centraalbureau voor Schimmelcultures 
Uppsalalaan 8, NL-3584 CT Utrecht. The 
Netherlands / P.O. Box 85167/ NL-3508 AD 
Utrecht, The Netherlands 
12 March 1996 (12.03.1996) 
CBS 277.96 


17-4 


Additional Indications 


NONE 


17-5 ; 


Designated States for Which 
Indications are Made 


all designated States 



78 



WO 03/000941 



PCT/DK02/00429 



17-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


10 


The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 




18-1 


page 


O 27 


16-2 


line 


15-23 


18-3 


Identification of Deposit 




1B-3-1 


Name of depositary institution 


DSMZ -Deutsche Sanrailung von 
Mikroorganismen und Zellkulturen GmbH 


18-3-2 


Address of depositary institution 


Mascheroder Weg lb, D- 3 8124 
Braunschweig , Germany 


18-3-3 


Date of deposit 


13 June 2001 (13.06.2001) 


18-3-4 


Accession Number 


DSHZ 14348 


18-4 


Additional Indications 


NONE 


18-5 


Designated States for Which 
Indications are Made 


all designated States 


18-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


4Q 


The indications made below relate to 
the deposited mlcroorganlsm(s) or 
other biological material referred to 
in the description on! 




19-1 


page 




19-2 


line 


25-33 


19-3 


Identification of Deposit 




19-3-1 


Name of depositary institution 


DSMZ -Deutsche Sammlung von 
Mikroorganismen und Zellkulturen GmbH 


19-3-2 


Address of depositary institution 


Mascheroder Weg lb, D-38124 
Braunschweig , Germany 


19-3-3 


Date of deposit 


21 June 2002 (21.06.2002) 


19-3-4' 


Accession Number 


DSMZ 1506 6 


19-4 


Additional Indications 


NONE 


19-5 


Designated States for Which 
Indications are Made 


all designated States 


19-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


20 


The Indications made below relate to 
the deposited mlcroorganism(s) or 
other biological material referred to 
in the description on: 




20-1 


page 


69-70 


20-2 


fine 


35-6 



79 



WO 03/000941 



PCT/DK02/00429 



20-3 
20-3-1 

20-3-2 

20-3-3 
20-3-4 


Identification of Deposit 
Name of depositary institution 

Address of depositary institution 

Date of deposit 
Accession Number 


DSMZ -Deutsche Sammlung von 

Mikroorganismen und Zellkulturen GmbH 

Mascheroder Weg lb, D- 3 8124 

Braunschweig, Germany 

21 June 2002 (21.06.2002) 

DSMZ 15065 


20-4 


Additional Indications 


NONE 


20-5 


Designated States for Which 
Indications are Made 


all designated States 


20-6 


Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


21 

21-1 
21-2 


The indications made below relate to 
the deposited microorganism(s) or 
other biological material referred to 
in the description on: 
page 

line 


70 

8-16 


21-3 
21-3-1 

21-3-2 

21-3-3 
'21-3-4 


Identification of Deposit 
Name of depositary institution 

Address of depositary institution 

Date of deposit 
Accession Number 


DSMZ -Deutsche Sammlung von 

Mikroorganismen und Zellkulturen GmbH 

Mascheroder Weg lb, D-38124 

Braunschweig, Germany 

21 June 2002 (21.06-2002) 

DSMZ 15064 


21-4 


Additional Indications 


NONE 


21-5 


Designated States for Which 
Indications are Made 


all designated States 




Separate Furnishing of Indications 

These indications will be submitted to 
the International Bureau later 


NONE 


22 

22-1 
22-2 


The indications made below relate to 
the deposited microorganism (s) or 
other biological materia) referred to 
in the description on: 
page 

line 


70 

18-26 


22-3 
22-3-1 

22-3-2 

22-3-3 
22-3^ 


Identification of Deposit 
Name of depositary institution 

Address of depositary institution 

Date of deposit 
Accession Number 


DSMZ -Deutsche Sammlung von 

Mikroorganismen und Zellkulturen GmbH 

Mascheroder Weg lb, D-38124 

Braunschweig, Germany 

21 June 2002 (21.06.2002) 

DSMZ 15067 


22-4 


Additional indications 


NONE 


22-5 


Designated States for Which 
indications are Made 


all designated States 



80 



WO 03/000941 

Claims 



PCT/DK02/00429 



1. A polypeptide having cellobiohydrolase I activity, selected from the group consisting of: 

5 (a) a polypeptide comprising an amino acid sequence selected from the group consisting of: 
an amino acid sequence which has at least 80% identity with amino acids 1 to 526 of 
SEQIDNO:2, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 529 of 
SEQ ID NO:4, 

10 an amino acid sequence which has at least 80% identity with amino acids 1 to 451 of 

SEQ ID NO:6. 

an amino acid sequence which has at least 80% identity with amino acids 1 to 457 of 
SEQ ID NO:8, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 538 of 
15 SEQIDNO:10, 

an amino acid sequence which has at least 70% identity with amino acids 1 to 415 of 
SEQ ID NO: 12, 

an amino acid sequence which has at least 70% identity with amino acids 1 to 447 of 
SEQ IDNO:14, 

20 an amino acid sequence which has at least 80% identity with amino acids 1 to 452 of 

SEQIDNO:16, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 454 of 
SEQ ID NO:38, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 458 of 
25 SEQ ID NO:40, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 450 of 
SEQIDNO:42, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 446 of 
SEQ ID NO:44, 

30 an amino acid sequence which has at least 80% identity with amino acids 1 to 527 of 

SEQ ID NO:46, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 455 of 
SEQ ID NO:48, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 464 of 
35 SEQIDNO:50, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 460 of 
SEQIDNO:52, 

81 



WO 03/000941 PCT/DK02/00429 

an amino acid sequence which has at least 80% identity with amino acids 1 to 450 of 
SEQ ID NO:54, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 532 of 
SEQ ID NO:56, 

an amino acid sequence which has at least 80% identity with amino acids 1 to 460 of 
SEQ ID NO:58, 

an amino add sequence which has at least 80% identity with amino acids 1 to 525 of 
SEQ ID NO:60, and 

an amino add sequence which has at least 80% identity with amino acids 1 to 456 of 
SEQ ID NO:66; 

(b) a polypeptide comprising an amino acid sequence selected from the group consisting of: 
an amino add sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Acremonium 
thermophilum, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Chaetomium 
thermophilum, 

an amino add sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Scytalidium 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Scytalidium 
thermophilum, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nudeotide sequence present in 

Thermoascus aurantiacus, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Thielavia 
australiensis, 

an amino acid sequence which has at least 70% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Verticillium 
tenerum, 

an amino add sequence which has at least 70% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Neoteimes 
castaneus, 

an amino add sequence which has at least 80% identity with the polypeptide encoded by 
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the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Melanocarpus albomyces, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded 
by the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Acremonium sp., 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Chaetomidium pingtungium, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Sporotrichum pruinosum, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Diplodia 
gossypina, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Trichophaea 
saccata, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in 
Myceliophthora thermophila, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Exidia 
glandulosa, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Xylaria 
hypoxylon, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Poitrasia 
circinans, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in Coprinus 
cinereuSt 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
the cellobiohydrolase I encoding part of the nucleotide sequence present in 

Pseudoplectania nigrella, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Trichothecium roseum IFO 5372, 
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an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Humicola nigrescens CBS 819.73, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Cladorrhinum foecundissimum CBS 427.97, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Diplodia gossypina CBS 247.96, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Myceliophthora thermophila CBS 1 17.65, 

an amino acid sequence encoded by the cellobiohydrolase I encoding pari of the 

nucleotide sequence present in Rhizomucor pusillus CBS 109471, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Meripilus giganteus CBS 521.95, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Exidia glandulosa CBS 2377.96, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Xylaria hypoxylon CBS 284.96, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Trichophaea saccata CBS 604.70, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 

nucleotide sequence present in Chaetomium sp., 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Myceliophthora hinnulea, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Thielavia cf. microspora, 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Aspergillus sp., 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Scopulariopsis sp., 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Fusarium sp., 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Verticillium sp. f and 

an amino acid sequence encoded by the cellobiohydrolase I encoding part of the 
nucleotide sequence present in Phytophthora infestans\ 

(c) a polypeptide comprising an amino acid sequence selected from the group consisting of: 
an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
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nucleotides 1 to 1578 of SEQ ID NO:1, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1587 of SEQ ID NO:3, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1353 of SEQ ID NO:5, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1371 of SEQ ID NO:7, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1614 of SEQ ID NO:9, 

an amino acid sequence which has at least 70% identity with the polypeptide encoded by 
nucleotides 1 to 1245 of SEQ ID NO:11, 

an amino acid sequence which has at least 70% identity with the polypeptide encoded by 
nucleotides 1 to 1341 of SEQ ID NO:13, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1356 of SEQ ID NO:15, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1365 of SEQ ID NO:37, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1377 of SEQ ID NO:39, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1353 of SEQ ID NO:41, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1341 of SEQ ID NO:43, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1584 of SEQ ID NO:45, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1368 of SEQ ID NO:47, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1395 of SEQ ID NO:49, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1383 of SEQ ID NO:51, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1353 of SEQ ID NO:53, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1599 of SEQ ID NO:55, 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1383 of SEQ ID NO:57, 
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an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1 578 of SEQ ID NO:59, and 

an amino acid sequence which has at least 80% identity with the polypeptide encoded by 
nucleotides 1 to 1 371 of SEQ ID NO:65; 

(d) a polypeptide which is encoded by a nucleotide sequence which hybridizes under high 
stringency conditions with a polynucleotide probe selected from the group consisting of: 

(i) the complementary strand of the nucleotides selected from the group consisting of: 
nucleotides 1 to 1578 of SEQ ID NO:1, 

nucleotides 1 to 1587 of SEQ ID NO:3, 
nucleotides 1 to 1353 of SEQ ID NO:5, 
nucleotides 1 to 1371 of SEQ ID NO:7. 
nucleotides 1 to 1614 of SEQ ID NO:9, 
nucleotides 1 to 1 245 of SEQ ID NO: 1 1 , 
nucleotides 1 to 1341 of SEQ ID NO:13, 
nucleotides 1 to 1356 of SEQ ID NO: 15, 
nucleotides 1 to 1365 of SEQ ID NO:37, 
nucleotides 1 to 1377 of SEQ ID NO:39, 
nucleotides 1 to 1353 of SEQ ID NO:41, 
nucleotides 1 to 1341 of SEQ ID NO:43, 
nucleotides 1 to 1584 of SEQ ID NO:45, 
nucleotides 1 to 1368 of SEQ ID NO:47, 
nucleotides 1 to 1395 of SEQ ID NO:49, 
nucleotides 1 to 1383 of SEQ ID NO:51, 
nucleotides 1 to 1353 of SEQ ID NO:53, 
nucleotides 1 to 1599 of SEQ ID NO:55, 
nucleotides 1 to 1383 of SEQ ID NO:57, 
nucleotides 1 to 1578 of SEQ ID NO:59, and 
nucleotides 1 to 1371 of SEQ ID NO:65; 

(ii) the complementary strand of the nucleotides selected from the group consisting of: 
nucleotides 1 to 500 of SEQ ID NO:1 , 

nucleotides 1 to 500 of SEQ ID NO:3, 
nucleotides 1 to 500 of SEQ ID NO:5, 
nucleotides 1 to 500 of SEQ ID NO:7, 
nucleotides 1 to 500 of SEQ ID NO:9, 
nucleotides 1 to 500 of SEQ ID NO:1 1, 
nucleotides 1 to 500 of SEQ ID NO:13. 
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nucleotides 1 to 500 of SEQ ID NO: 15, 

nucleotides 1 to 500 of SEQ ID N0 37, 

nucleotides 1 to 500 of SEQ ID NO:39, 

nucleotides 1 to 500 of SEQ ID NO:41, 
5 nucleotides 1 to 500 of SEQ ID NO:43, 

nucleotides 1 to 500 of SEQ ID N0.45, 

nucleotides 1 to 500 of SEQ ID NO:47, 

nucleotides 1 to 500 of SEQ ID NO:49, 

nucleotides 1 to 500 of SEQ ID NO:51 , 
10 nucleotides 1 to 500 of SEQ ID NO:53, 

nucleotides 1 to 500 of SEQ ID NO:55, 

nucleotides 1 to 500 of SEQ ID NO:57, 

nucleotides 1 to 500 of SEQ ID NO:59, 

nucleotides 1 to 500 of SEQ ID NO:65. 
15 nucleotides 1 to 221 of SEQ ID NO:1 7. 

nucleotides 1 to 239 of SEQ ID NO:18. 

nucleotides 1 to 199 of SEQ ID NO:19, 

nucleotides 1 to 191 of SEQ ID NO:20, 

nucleotides 1 to 232 of SEQ ID NO:21, 
20 nucleotides 1 to 467 of SEQ ID NO:22, 

nucleotides 1 to 534 of SEQ ID NO:23. 

nucleotides 1 to 563 of SEQ ID NO:24, 

nucleotides 1 to 218 of SEQ ID NO:25, 

nucleotides 1 to 492 of SEQ ID NO:26, 
25 nucleotides 1 to 481 of SEQ ID NO:27, 

nucleotides 1 to 463 of SEQ ID NO:28. 

nucleotides 1 to 513 of SEQ ID NO:29, 

nucleotides 1 to 579 of SEQ ID NO:30. 

nucleotides 1 to 514 of SEQ ID NO:31, 
30 nucleotides 1 to 477 of SEQ ID NO:32, 

nucleotides 1 to 500 of SEQ ID NO:33, 

nucleotides 1 to 470 of SEQ ID NO:34, 

nucleotides 1 to 491 of SEQ ID NO:35, 

nucleotides 1 to 221 of SEQ ID NO:36, 
35 nucleotides 1 to 519 of SEQ ID NO:61 . 

nucleotides 1 to 497 of SEQ ID NO:62, 

nucleotides 1 to 498 of SEQ ID NO:63, 
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nucleotides 1 to 525 of SEQ ID NO:64, and 
nucleotides 1 to 951 of SEQ ID NO:67; and 
(iii) the complementary strand of the nucleotides selected from the group consisting of: 
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25 

(e) a fragment of (a), (b) or (c) that has cellobiohydrolase I activity. 

2. The polypeptide according to claim 1, comprising an amino acid sequence selected from 
the group consisting of: 

30 an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity, with amino acids 1 to 526 of SEQ ID NO:2; 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity, with amino acids 1 to 529 of SEQ ID NO:4, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 

35 preferably at least 95% identity, with amino acids 1 to 451 of SEQ ID NO:6, 

an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity, with amino acids 1 to 457 of SEQ ID NO:8, 
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an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity, with amino acids 1 to 538 of SEQ ID NO: 10, 
an amino acid sequence which has at least 75% identity, preferably at least 80% identity, more 
preferably at least 90% identity, with amino acids 1 to 415 of SEQ ID NO: 12, 
5 an amino acid sequence which has at least 75% identity, preferably at least 80% identity, more 
preferably at least 90% identity, with amino acids 1 to 447 of SEQ ID NO: 14, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity, with amino acids 1 to 452 of SEQ ID NO: 16, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 

10 preferably at least 95% identity with amino acids 1 to 454 of SEQ ID NO:38, 

an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 458 of SEQ ID NO:40, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 450 of SEQ ID NO:42, 

15 an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 446 of SEQ ID NO:44, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 527 of SEQ ID NO:46, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 

20 preferably at least 95% identity with amino acids 1 to 455 of SEQ ID NO:48, 

an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 464 of SEQ ID NO:50, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 460 of SEQ ID NO: 52, 

25 an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 450 of SEQ ID NO:54, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 532 of SEQ ID NO:56, 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 

30 preferably at least 95% identity with amino acids 1 to 460 of SEQ ID NO: 58, 

an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 525 of SEQ ID NO:60, and 
an amino acid sequence which has at least 85% identity, preferably at least 90% identity, more 
preferably at least 95% identity with amino acids 1 to 456 of SEQ ID NO:66. 

35 

3. The polypeptide according to any of claims 1-2, which consists of an amino acid sequence 
selected from the group consisting of: 
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amino acids 1 to 526 of SEQ ID NO:2, 

amino acids 1 to 529 of SEQ ID NO:4, 

amino acids 1 to 451 of SEQ ID NO:6 t 

amino acids 1 to 457 of SEQ ID NO:8, 
5 amino adds 1 to 538 of SEQ ID NO: 10, 

amino acids 1 to 415 of SEQ ID NO: 12, 

amino acids 1 to 447 of SEQ ID NO: 14, 

amino acids 1 to 452 of SEQ ID NO:16, 

amino acids 1 to 454 of SEQ ID NO:38, 
10 amino acids 1 to 458 of SEQ ID NO:40, 

amino adds 1 to 450 of SEQ ID NO:42, 

amino acids 1 to 446 of SEQ ID NO:44, 

amino adds 1 to 527 of SEQ ID NO:46, 

amino adds 1 to 455 of SEQ ID NO:48, 
15 amino adds 1 to 464 of SEQ ID NO:50, 

amino adds 1 to 460 of SEQ ID NO:52, 

amino acids 1 to 450 of SEQ ID NO:54, 

amino acids 1 to 532 of SEQ ID NO:56, 

amino acids 1 to 460 of SEQ ID NO:58, 
20 amino adds 1 to 525 of SEQ ID NO:60, and 

amino adds 1 to 456 of SEQ ID NO:66. 



4. The polypeptide according to any of claims 1-2, where the polypeptide is an artificial variant 
which comprises an amino acid sequence that has at least one substitution, deletion and/or 
25 insertion of an amino add as compared to an amino acid sequence selected from the group 
consisting of: 

amino adds 1 to 526 of SEQ ID NO:2, 

amino adds 1 to 529 of SEQ ID NO:4, 

amino acids 1 to 451 of SEQ ID NO:6, 
30 amino acids 1 to 457 of SEQ ID NO:8, 

amino adds 1 to 538 of SEQ ID NO:10, 

amino adds 1 to 415 of SEQ ID NO: 12, 

amino acids 1 to 447 of SEQ ID NO: 14, 

amino adds 1 to 452 of SEQ ID NO:16, 
35 amino adds 1 to 454 of SEQ ID NO:38, 

amino acids 1 to 458 of SEQ ID NO:40, 

amino acids 1 to 450 of SEQ ID NO:42, 
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amino acids 1 to 446 of SEQ ID NO:44, 
amino acids 1 to 527 of SEQ ID NO:46, 
amino acids 1 to 455 of SEQ ID NO:48, 
amino acids 1 to 464 of SEQ ID NO:50, 
5 amino acids 1 to 460 of SEQ ID NO:52, 
amino acids 1 to 450 of SEQ ID NO: 54, 
amino acids 1 to 532 of SEQ ID NO:56, 
amino acids 1 to 460 of SEQ ID NO:58, 
amino acids 1 to 525 of SEQ ID NO:60, and 
10 amino acids 1 to 456 of SEQ ID NO:66. 



5. The polypeptide according to claim 1, comprising an amino acid sequence selected from 
the group consisting of: 

an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 

15 the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0584, 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0581, 

20 an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0585, 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 

25 inserted into a plasmid present in the deposited microorganism CGMCC No. 0582, 

an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0583, 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 

30 the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CBS 109513, 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism DSM 14348, 

35 an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0580, 
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an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0747, 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
5 the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0748, 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0749, 

10 an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism CGMCC No. 0750, 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 

15 inserted into a plasmid present in the deposited microorganism DSM 15064, 

an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism DSM 15065, 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 

20 the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism DSM 15066, and 
an amino acid sequence which has at least 80% identity, preferably at least 90% identity, with 
the polypeptide encoded by the cellobiohydrolase I encoding part of the nucleotide sequence 
inserted into a plasmid present in the deposited microorganism DSM 15067. 

25 

6. The polypeptide according to claim 5, which comprises the amino acid sequence encoded 

by the cellobiohydrolase I encoding part of the nucleotide sequence inserted into a plasmid 

present in a deposited microorganism selected from the group consisting of: 

CGMCC No. 0584, 
30 CGMCC No. 0581, 

CGMCC No. 0585, 

CGMCC No. 0582, 

CGMCC No. 0583, 

CBS 109513, 
35 DSM 14348, 

CGMCC No. 0580, 

CGMCC No. 0747, 
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CGMCC No. 0748, 
CGMCC No. 0749, 
CGMCC No. 0750, 
DSM 15064, 
5 DSM 15065, 
DSM 15066, and 
DSM 15067. 



7. The polypeptide according to claims 5 or 6, which consists of the amino acid sequence 
10 encoded by the cellobiohydrolase I encoding part of the nucleotide sequence inserted into a 

plasmid present in a deposited microorganism selected from the group consisting of: 

CGMCC No. 0584, 

CGMCC No. 0581, 

CGMCC No. 0585, 
15 CGMCC No. 0582, 

CGMCC No. 0583, 

CBS 109513, 

DSM 14348, 

CGMCC No. 0580, 
20 CGMCC No. 0747, 

CGMCC No. 0748, 

CGMCC No. 0749, 

CGMCC No. 0750, 

DSM 15064, 
25 DSM 15065, 

DSM 15066, and 

DSM 15067. 



8. The polypeptide according to claims 5 or 6, where the polypeptide is an artificial variant 
30 which comprises an amino acid sequence that has at least one substitution, deletion and/or 

insertion of an amino acid as compared to the amino acid sequence encoded by the 

cellobiohydrolase I encoding part of the nucleotide sequence inserted into a plasmid present in 

a deposited microorganism selected from the group consisting of: 

CGMCC No. 0584, 
35 CGMCC No. 0581. 

CGMCC No. 0585, 

CGMCC No. 0582, 
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CGMCC No. 0583, 
CBS 109513, 
DSM 14348, 
CGMCC No. 0580, 
5 CGMCC No. 0747, 
CGMCC No. 0748, 
CGMCC No. 0749, 
CGMCC No. 0750, 
DSM 15064, 
10 DSM 15065, 

DSM 15066, and 
DSM 15067. 

9. A polynucleotide having a nucleotide sequence which encodes for the polypeptide defined 
15 in any of claims 1-8. 

10. A nucleic acid construct comprising the nucleotide sequence defined in claim 9 operably 
linked to one or more control sequences that direct the production of the polypeptide in a 
suitable host. 

20 

1 1. A recombinant expression vector comprising the nucleic acid construct defined in claim 10. 

12. A recombinant host cell comprising the nucleic acid construct defined in claim 11. 

25 13. A method for producing a polypeptide as defined in any of claims 1-8, the method 
comprising: 

(a) cultivating a strain, which in its wild-type form is capable of producing the polypeptide, to 
produce the polypeptide; and 

(b) recovering the polypeptide. 

30 

14. A method for producing a polypeptide as defined in any of claims 1-8, the method 
comprising: 

(a) cultivating a recombinant host cell as defined in claim 12 under conditions conducive for 
production of the polypeptide; and 
35 (b) recovering the polypeptide. 



15. A method for in-situ production of a polypeptide as defined in any of claims 1-8, the 
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method comprising: 

(a) cultivating a recombinant host cell as defined in claim 12 under conditions conducive for 
production of the polypeptide; and 

(b) contacting the polypeptide with a desired substrate without prior recovery of the 
5 polypeptide. 

16. A polynucleotide comprising a nucleotide sequence selected from the group consisting of: 
a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1578 of SEQ ID 
NO:1, 

10 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1587 of SEQ ID 

NO:3 ? 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1353 of SEQ ID 
NO:5 f 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1371 of SEQ ID 
15 NO:7 t 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1614 of SEQ ID 

NO:9, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1245 of SEQ ID 
NO:11, 

20 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1341 of SEQ ID 
NO: 13, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1356 of SEQ ID 
NO:15, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1365 of SEQ ID 
25 NO:37, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1377 of SEQ ID 
NO:39 t 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1353 of SEQ ID 
NO:41, 

30 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1341 of SEQ ID 
NO:43, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1584 of SEQ ID 
NO:45, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1368 of SEQ ID 
35 NO:47, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1395 of SEQ ID 
NO:49 f 
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a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1383 of SEQ ID 
NO:51, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1353 of SEQ ID 
NO:53, 

5 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1599 of SEQ ID 
NO:55, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1383 of SEQ ID 
NO:57, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1578 of SEQ ID 
10 NO:59, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 1371 of SEQ ID 
NO:65, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:1, 

15 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:3, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 

NO:5, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
20 NO:7, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:9, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:11, 

25 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:13 f 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:15. 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
30 NO:37, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:39, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:41, 

35 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:43. 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
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NO:45, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ IO 
NO:47, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
5 NO:49, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:51, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 

NO:53, 

10 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 

NO:55, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:57, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 

15 NO:59, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:65, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 221 of SEQ ID 
NO:17, 

20 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 239 of SEQ ID 
NO:18, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 199 of SEQ ID 
NO:19, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 191 of SEQ ID 
25 NO:20. 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 232 of SEQ ID 
NO:21, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 467 of SEQ ID 

NO:22, 

30 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 534 of SEQ ID 
NO:23, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 563 of SEQ ID 
NO:24, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 218 of SEQ ID 
35 NO:25, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 492 of SEQ ID 
NO:26, 
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a nucleotide sequence which has at least 80% identity with nucleotides 1 to 481 of SEQ ID 

NO:27 f 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 463 of SEQ ID 
NO:28 t 

5 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 513 of SEQ ID 
NO:29, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 579 of SEQ ID 
NO:30 f 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 514 of SEQ ID 
10 NO:31, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 477 of SEQ ID 
NO:32, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 500 of SEQ ID 
NO:33, 

15 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 470 of SEQ ID 
NO:34, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 491 of SEQ ID 
NO:35 t 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 221 of SEQ ID 
20 NO:36, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 519 of SEQ ID 
NO:61, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 497 of SEQ ID 
NO:62, 

25 a nucleotide sequence which has at least 80% identity with nucleotides 1 to 498 of SEQ ID 
NO:63, 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 525 of SEQ ID 
NO:64, and 

a nucleotide sequence which has at least 80% identity with nucleotides 1 to 951 of SEQ ID 
30 NO:67. 



17. A polynucleotide comprising a nucleotide sequence selected from the group consisting of: 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
35 microorganism CGMCC No. 0584, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
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microorganism CGMCC No. 0581, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism CGMCC No. 0585, 
5 a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism CGMCC No. 0582, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
10 microorganism CGMCC No. 0583, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism CBS 109513, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
15 part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism DSM 14348, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism CGMCC No. 0580, 
20 a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism CGMCC No. 0747, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
25 microorganism CGMCC No. 0748, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism CGMCC No. 0749, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
30 part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism CGMCC No. 0750, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism DSM 15064, 
35 a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism DSM 15065, 
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a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism DSM 15066, and 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
5 part of the nucleotide sequence inserted into a plasmid present in the deposited 
microorganism DSM 15067. 

18. A polynucleotide comprising a nucleotide sequence selected from the group consisting of: 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
10 part of the nucleotide sequence present in the microorganism Trichothecium roseum IFO 
5372, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Humicola nigrescens CBS 
819.73, 

15 a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Cladorrhinum foecundissimum 
CBS 427.97, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Diptodia gossypina CBS 247.96, 
20 a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Myceliophthora thermophila 
CBS 117.65, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Rhizomucor pusillus CBS 
25 109471, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Mehpilus giganteus CBS 
521.95, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
30 part of the nucleotide sequence present in the microorganism Exidia glandulosa CBS 2377.96, 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Xylaria hypoxylon CBS 284.96, 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Trichophaea saccata CBS 
35 804.70, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Acremonium sp., 
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a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Chaetomium sp., 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Chaetomidium pingtungium, 
5 a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Myceliophthora thermophila, 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Myceliophthora hinnulea, 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 

10 part of the nucleotide sequence present in the microorganism Sporotrichum pruinosum, 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Thielavia cf. microspora, and 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Scytalidium sp., 

15 a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Aspergillus sp., 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Scopulariopsis sp., 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 

20 part of the nucleotide sequence present in the microorganism Fusarium sp., 

a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Verticilium sp., and 
a nucleotide sequence which has at least 80% identity with the cellobiohydrolase I encoding 
part of the nucleotide sequence present in the microorganism Phytophthora infestans. 

25 

19. A polynucleotide having a nucleotide sequence which encodes a polypeptide having 
cellobiohydrolase I activity, and which hybridizes under high stringency conditions with a 
polynucleotide probe selected from the group consisting of 

(i) the complementary strand of the nucleotides selected from the group consisting of: 
30 nucleotides 1 to 1 578 of SEQ ID NO:1 , 

nucleotides 1 to 1587 of SEQ ID NO:3, 

nucleotides 1 to 1353 of SEQ ID NO:5, 

nucleotides 1 to 1371 of SEQ ID NO:7, 

nucleotides 1 to 1614 of SEQ ID NO:9, 
35 nucleotides 1 to 1245 of SEQ ID NO:1 1 , 

nucleotides 1 to 1341 of SEQ ID NO:13, 

nucleotides 1 to 1356 of SEQ ID NO: 15, 
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nucleotides 1 to 1365 of SEQ ID NO:37, 
nucleotides 1 to 1377 of SEQ ID NO:39, 
nucleotides 1 to 1353 of SEQ ID NO:41, 
nucleotides 1 to 1341 of SEQ ID NO:43, 
nucleotides 1 to 1 584 of SEQ ID NO:45, 
nucleotides 1 to 1368 of SEQ ID NO:47, 
nucleotides 1 to 1 395 of SEQ I D NO:49, 
nucleotides 1 to 1 383 of SEQ ID NO:51 , 
nucleotides 1 to 1353 of SEQ ID NO:53, 
nucleotides 1 to 1599 of SEQ ID NO:55, 
nucleotides 1 to 1383 of SEQ ID NO:57, 
nucleotides 1 to 1578 of SEQ ID NO:59, and 
nucleotides 1 to 1371 of SEQ ID NO:65; 
(ii) the complementary strand of the nucleotides selected from the group consisting of: 
nucleotides 1 to 500 of SEQ ID NO:1, 
nucleotides 1 to 500 of SEQ ID NO:3, 
nucleotides 1 to 500 of SEQ ID NO:5. 
nucleotides 1 to 500 of SEQ ID NO:7, 
nucleotides 1 to 500 of SEQ ID NO:9. 
nucleotides 1 to 500 of SEQ ID NO: 1 1 , 
nucleotides 1 to 500 of SEQ ID NO:13, 
nucleotides 1 to 500 of SEQ ID NO: 15, 
nucleotides 1 to 500 of SEQ ID NO:37, 
nucleotides 1 to 500 of SEQ ID NO:39, 
nucleotides 1 to 500 of SEQ ID NO:41 , 
nucleotides 1 to 500 of SEQ ID NO:43, 
nucleotides 1 to 500 of SEQ ID NO:45, 
nucleotides 1 to 500 of SEQ ID NO:47, 
nucleotides 1 to 500 of SEQ ID NO:49, 
nucleotides 1 to 500 of SEQ ID NO:51, 
nucleotides 1 to 500 of SEQ ID NO:53, 
nucleotides 1 to 500 of SEQ ID NO:55, 
nucleotides 1 to 500 of SEQ ID NO:57, 
nucleotides 1 to 500 of SEQ ID NO:59, 
nucleotides 1 to 500 of SEQ ID NO:65, 
nucleotides 1 to 221 of SEQ ID NO:17, 
nucleotides 1 to 239 of SEQ ID NO: 18, 
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nucleotides 1 to 199 of SEQ ID NO: 19, 

nucleotides 1 to 1 91 of SEQ ID NO:20, 

nucleotides 1 to 232 of SEQ ID NO:21, 

nucleotides 1 to 467 of SEQ ID NO:22, 
5 nucleotides 1 to 534 of SEQ ID NO:23, 

nucleotides 1 to 563 of SEQ ID NO:24, 

nucleotides 1 to 218 of SEQ ID NO:25, 

nucleotides 1 to 492 of SEQ ID NO:26, 

nucleotides 1 to 481 of SEQ ID NO:27, 
10 nucleotides 1 to 463 of SEQ ID NO:28, 

nucleotides 1 to 513 of SEQ ID NO:29, 

nucleotides 1 to 579 of SEQ ID NO:30, 

nucleotides 1 to 514 of SEQ ID NO:31 , 

nucleotides 1 to 477 of SEQ ID NO:32, 
1 5 nucleotides 1 to 500 of SEQ ID NO:33, 

nucleotides 1 to 470 of SEQ ID NO:34, 

nucleotides 1 to 491 of SEQ ID NO:35, 

nucleotides 1 to 221 of SEQ ID NO:36, 

nucleotides 1 to 519 of SEQ ID NO:61, 
20 nucleotides 1 to 497 of SEQ ID NO:62, 

nucleotides 1 to 498 of SEQ ID NO:63, 

nucleotides 1 to 525 of SEQ ID NO:64, and 

nucleotides 1 to 951 of SEQ ID NO:67; and 
(iii) the complementary strand of the nucleotides selected from the group consisting of: 
25 nucleotides 1 to 200 of SEQ ID NO: 1 , 

nucleotides 1 to 200 of SEQ ID NO:3, 

nucleotides 1 to 200 of SEQ ID NO:5, 

nucleotides 1 to 200 of SEQ ID NO:7, 

nucleotides 1 to 200 of SEQ ID NO:9, 
30 nucleotides 1 to 200 of SEQ ID NO:1 1 , 

nucleotides 1 to 200 of SEQ ID NO:13, 

nucleotides 1 to 200 of SEQ ID NO:15, 

nucleotides 1 to 200 of SEQ ID NO:37, 

nucleotides 1 to 200 of SEQ ID NO:39, 
35 nucleotides 1 to 200 of SEQ ID NO:41 , 

nucleotides 1 to 200 of SEQ ID NO:43, 

nucleotides 1 to 200 of SEQ ID NO:45, 
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nucleotides 1 to 200 of SEQ ID NO:47, 
nucleotides 1 to 200 of SEQ ID NO:49, 
nucleotides 1 to 200 of SEQ ID NO:51, 
nucleotides 1 to 200 of SEQ ID NO:53, 
5 nucleotides 1 to 200 of SEQ ID NO:55, 

nucleotides 1 to 200 of SEQ ID NO:57, 
nucleotides 1 to 200 of SEQ ID NO:59, and 
nucleotides 1 to 200 of SEQ ID NO:65. 

10 20. A polynucleotide comprising a modified nucleotide sequence selected from the group 
consisting of: 

the nucleotide sequence of SEQ ID NO:1 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 526 
ofSEQIDNO:2, 

15 the nucleotide sequence of SEQ ID NO:3 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 529 
ofSEQIDNO:4, 

the nucleotide sequence of SEQ ID NO:5 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 451 
20 ofSEQIDNO:6, 

the nucleotide sequence of SEQ ID NO:7 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 457 
of SEQ IDNO:8 t 

the nucleotide sequence of SEQ ID NO:9 comprising at least one modification, where the 
25 modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 538 
of SEQ ID NO: 10, 

the nucleotide sequence of SEQ ID NO: 11 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 415 
ofSEQIDNO:12, 

30 the nucleotide sequence of SEQ ID NO: 13 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 447 
of SEQ IDNO:14, 

the nucleotide sequence of SEQ ID NO:15 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 452 
35 ofSEQIDNO:16, 

the nucleotide sequence of SEQ ID NO:37 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 454 
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ofSEQIDIMO:38, 

the nucleotide sequence of SEQ ID NO:39 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 458 
0fSEQIDNO:40, 

5 the nucleotide sequence of SEQ ID NO:41 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 450 
of SEQ ID NO:42, 

the nucleotide sequence of SEQ ID NO:43 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 446 
10 ofSEQIDNO:44, 

the nucleotide sequence of SEQ ID NO:45 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 527 
of SEQ ID NO:46, 

the nucleotide sequence of SEQ ID NO:47 comprising at least one modification, where the 
15 modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 455 
of SEQ ID NO:48, 

the nucleotide sequence of SEQ ID NO:49 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 464 
of SEQ IDNO:50, 

20 the nucleotide sequence of SEQ ID NO:51 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 460 
ofSEQIDNO:52, 

the nucleotide sequence of SEQ ID NO:53 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 450 
25 ofSEQIDNO:54, 

the nucleotide sequence of SEQ ID NO:55 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 532 
of SEQ ID NO:56, 

the nucleotide sequence of SEQ ID NO:57 comprising at least one modification, where the 
30 modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 460 
of SEQ ID NO:58, 

the nucleotide sequence of SEQ ID NO:59 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 525 
of SEQ ID NO:60, and 

35 the nucleotide sequence of SEQ ID NO:65 comprising at least one modification, where the 
modified nucleotide sequence encodes a polypeptide which consists of amino acids 1 to 456 
of SEQ ID NO:66. 

105 



WO 03/000941 



PCT/DK02/00429 



21. A polypeptide having cellobiohydrolase I activity which is encoded by the cellobiohydrolase 
I encoding part of the nucleotide sequence present in a microorganism selected from the 
group consisting of: 

5 a microorganism belonging to Zygomycota, preferably belonging to the Mucorales, more 
preferably belonging to the family Mucoraceae or the family Choanephoraceae, most 
preferably belonging to the genus Rhizomucor or the genus Poitrasia, in particular Rhizomucor 
pusillus or Poitrasia circinans, 

a microorganism belonging to the Oomycetes, preferably belonging to the order Pythiates, 
10 more preferably belonging to the family Pythiaceae, most preferably belonging to the genus 
Phytophthora, in particular Phytophthora infestans, 

a microorganism belonging to Auriculariales, preferably belonging to the family Exidiaceae, 

more preferably belonging to the genus Exidia, most preferably Exidia glandulosa, 

a microorganism belonging to Xylariales, preferably belonging to the family Xylariaceae, more 

1 5 preferably belonging to the genus Xylaria, most preferably Xylaria hypoxylon, 

a microorganism belonging to Dothideales, preferably belonging to the family Dothideaceae, 
more preferably belonging to the genus Diplodia, most preferably Diplodia gossypina, 
a microorganism belonging to Pezizates, preferably belonging to the family Pyronemataceae 
or the family Sarcosomataceae, more preferably belonging to the genus Thchophaea or the 

20 genus Pseudoplectania, most preferably Thchophaea saccata or Pseudoplectania nigrella, 
a microorganism belonging to the family Rigidiporaceae, preferably belonging to the genus 
Meripilus, more preferably Meripilus giganteus, 

a microorganism belonging to the family Meruliaceae, preferably belonging to the genus 
Sporothrichum, more preferably Sporotrichum pruinosum, 
25 a microorganism belonging to the family Agaricaceae (under Basidiomycota, Hymenomycetes, 
Agaricales), more preferably belonging to the genus Coprinus, most preferably Cophnus 
cinereus, 

a microorganism belonging to the family Hypocreaceae, preferably belonging to the genus 
Acremonium or the genus Verticillium, more preferably Acremonium thermophilum or 
30 Verticillium tenerum, 

a microorganism belonging to the genus Cladorrhinum, preferably Cladorrhinum 
foecundissimum, 

a microorganism belonging to the genus Myceliophthora, preferably Myceliophthora 
thermophila or Myceliophthora hinnulea, 
35 a microorganism belonging to the genus Chaetomium, preferably Chaetomium thermophilum, 
a microorganism belonging to the genus Chaetomidium, preferably Chaetomidium 
pingtungium, 
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a microorganism belonging to the genus Thielavia, preferably Thielavia australiensis or 
Thielavia microspora, 

a microorganism belonging to the genus Thermoascus, preferably Thermoascus aurantiacus, 
a microorganism belonging to the genus Trichothecium, preferably Trichothecium roseum, and 
5 a microorganism belonging to the species Humicola nigrescens. 

22. A method for shuffling of DNA comprising using the polynucleotide as defined in any of 
claims 9 and 16-20. 

10 23. A polynucleotide encoding a polypeptide having cellobiase activity obtainable by the 
method of claim 22. 

24. A polypeptide having cellobiase activity encoded by the polynucleotide of claim 23. 

15 25. Use of the polynucleotide as defined in any of claims 9 and 16-20 for DNA shuffling. 

26. A method for producing ethanol from biomass, comprising contacting the biomass with the 
polypeptide as defined in any of claims 1-8. 

20 27. Use of the polypeptide as defined in any of claims 1-8 for producing ethanol. 

28. A transgenic plant, plant part or plant cell, which has been transformed with a nucleotide 
sequence encoding a polypeptide having cellobiohydrolase I activity as defined in any of 
claims 1- 8. 

25 

29. A detergent composition comprising a surfactant and the polypeptide according to any of 
claims 1-8. 
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SEQUENCE LISTING 

<110> Novozymes A/S 

<120> Polypeptides having cellobiohydrolase I activity and polynucleotides 
encoding same 

<130> 10129-WO 

<160> 67 

<170> Patentln version 3.1 

<210> 1 

<211> 1581 

<212> DNA 

<213> Acremonium thermophilum 
<220> 

<221> CDS 

<222> (1)..(1581) 

<223> 

<400> 1 

atg cac gcc aag ttc gcg acc etc gec gec ctt gtg gcg tec gec gcg 48 

Met His Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Ser Ala Ala 
15 10 15 

gcc cag cag gcc tgc aca etc acg get gag aac cac ccc acc ctg teg 96 
Ala Gin Gin Ala Cys Thr Leu Thr Ala Glu Asn His Pro Thr Leu Ser 
20 25 30 

tgg tec aag tgc acg tec ggc ggc age tgc acc age gtc teg ggc tec 144 
Trp Ser Lys Cys Thr Ser Gly Gly Ser Cys Thr Ser Val Ser Gly Ser 
35 40 45 

gtc acc ate gat gcc aac tgg egg tgg act cac cag gtc teg age teg 192 
Val Thr He Asp Ala Asn Trp Arg Trp Thr His Gin Val Ser Ser Ser 
50 55 60 

acc aac tgc tac acg ggc aat gag tgg gac acg tec ate tgc acc gac 240 
Thr Asn Cys Tyr Thr Gly Asn Glu Trp Asp Thr Ser He Cys Thr Asp 
65 70 75 80 

ggt get teg tgc gcc gcc gcc tgc tgc etc gat ggc gcc gac tac teg 288 
Gly Ala Ser Cys Ala Ala Ala Cys Cys Leu Asp Gly Ala Asp Tyr Ser 
85 90 * 95 

ggc acc tat ggc ate acc acc age ggc aac gcc etc age etc cag ttc 336 
Gly Thr Tyr Gly He Thr Thr Ser Gly Asn Ala Leu Ser Leu Gin Phe 
100 105 110 

gtc act cag ggc ccc tac teg acc aac att ggc teg cgt acc tac ctg 384 
Val Thr Gin Gly Pro Tyr Ser Thr Asn He Gly Ser Arg Thr Tyr Leu 
115 120 125 

atg gcc teg gac acc aag tac cag atg ttc act ctg etc ggc aac gag 432 
Met Ala Ser Asp Thr Lys Tyr Gin Met Phe Thr Leu Leu Gly Asn Glu 
130 135 140 

ttc acc ttc gac gtg gac gtc aca ggc etc ggc tgc ggt ctg aac ggc 480 
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Phe Thr Phe Asp Val Asp Val Thr Gly Leu Gly Cys Gly Leu Asn Gly 
145 150 155 160 

gcc etc tac ttc gtc tec atg gac gag gac ggt ggt ctt tec aag tac 52 8 

Ala Leu Tyr Phe Val Ser Met Asp Glu Asp Gly Gly Leu Ser Lys Tyr 
165 170 175 

teg ggc aac aag get ggc gcc aag tac ggc ace ggc tac tgc gac teg 576 
Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 
1B0 185 190 

cag tgc ccc cgc gac etc aag ttc ate aac ggc gag get aac aac gtt 624 
Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Glu Ala Asn Asn Val 
195 200 205 

ggc tgg acc ccg teg tec aac gac aag aac gcc ggc ttg ggc aac tac 672 
Gly Trp Thr Pro Ser Ser Asn Asp Lys Asn Ala Gly Leu Gly Asn Tyr 
210 215 220 

ggc age tgc tgc tec gag atg gat gtc tgg gag gcc aac age ate teg 720 
Gly Ser Cys Cys Ser Glu Met Asp Val Trp Glu Ala Asn Ser He Ser 
225 230 235 240 

gcg gcc tac acg ccc cat cct tgc act acc ate ggc cag acg cgc tgc 768 
Ala Ala Tyr Thr Pro His Pro Cys Thr Thr He Gly Gin Thr Arg Cys 
245 250 255 

gag ggc gac gac tgc ggt ggt acc tac age act gac cgc tac gcc ggc 816 
Glu Gly Asp Asp Cys Gly Gly Thr Tyr Ser Thr Asp Arg Tyr Ala Gly 
260 265 270 

gag tgc gac cct gac gga tgc gac ttc aac teg tac cgc atg ggc aac 864 
Glu Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asn 
275 280 285 

acg acc ttc tac ggc aag ggc atg acc gtc gac acc age aag aag ttc 912 
Thr Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Ser Lys Lys Phe 
290 295 300 

acg gtg gtg acc cag ttc ctg acg gac teg tct ggc aac ctg tec gag 960 
Thr Val Val Thr Gin Phe Leu Thr Asp Ser Ser Gly Asn Leu Ser Glu 
305 310 315 320 

ate aag cgc ttc tac gtc cag aac ggc gtc gtc att ccc aac teg aac 1008 
He Lys Arg Phe Tyr Val Gin Asn Gly Val Val lie Pro Asn Ser Asn 
325 330 335 

tec aac ate gcg ggc gtc teg ggc aac tec ate acc cag gcc ttc tgc 1056 
Ser Asn He Ala Gly Val Ser Gly Asn Ser He Thr Gin Ala Phe Cys 
340 345 350 

gat get cag aag acc get ttc ggc gac acc aac gtc ttc gac caa aag 1104 
Asp Ala Gin Lys Thr Ala Phe Gly Asp Thr Asn Val Phe Asp Gin Lys 
355 360 365 

ggc ggc ctg gcc cag atg ggc aag get ctt gcc cag ccc atg gtc etc 1152 
Gly Gly Leu Ala Gin Met Gly Lys Ala Leu Ala Gin Pro Met Val Leu 
370 375 380 

gtc atg tec etc tgg gac gac cac gcc gtc aac atg etc tgg etc gac 1200 
Val Met Ser Leu Trp Asp Asp His Ala Val Asn Met Leu Trp Leu Asp 
385 390 395 400 
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teg acc tac ccg acc aac gcg gec ggc aag ccg ggc gec gec cgc ggt 1248 
Ser Thr Tyr Pro Thr Asn Ala Ala Gly Lys Pro Gly Ala Ala Arg Gly 
405 410 415 

acc tgc ccc acc acc teg ggc gtc ccc gec gac gtc gag tec cag gcg 1296 
Thr Cys Pro Thr Thr Ser Gly Val Pro Ala Asp Val Glu Ser Gin Ala 
420 425 430 

ccc aac tec aag gtc ate tac tec aac ate cgc ttc ggc ccc ate ggc 1344 
Pro Asn Ser Lys Val He Tyr Ser Asn He Arg Phe Gly Pro He Gly 
435 440 445 

tec acc gtc tec ggc ctg ccc ggc ggc ggc age aac ccc ggc ggc ggc 1392 
Ser Thr Val Ser Gly Leu Pro Gly Gly Gly Ser Asn Pro Gly Gly Gly 
450 455 , 460 

tec age tec acc acc acc acc acc aga ccc gee acc tec acc acc tec 1440 
Ser Ser Ser Thr Thr Thr Thr Thr Arg Pro Ala Thr Ser Thr Thr Ser 
465 470 475 480 

teg gee age tec ggc ccg acc ggc ggt ggc acg get gec cac tgg ggc 1488 
Ser Ala Ser Ser Gly Pro Thr Gly Gly Gly Thr Ala Ala His Trp Gly 
485 490 495 

cag tgc ggc ggc ate ggc tgg acc ggc ccg acc gtc tgc gee teg ccc 1536 
Gin Cys Gly Gly He Gly Trp Thr Gly Pro Thr Val Cys Ala Ser Pro 
500 505 510 

tac acc tgc cag aag ctg aac gac tgg tac tac cag tgc etc taa 1581 
Tyr Thr Cys Gin Lys Leu Asn Asp Trp Tyr Tyr Gin Cys Leu 
515 520 525 



<210> 2 
<211> 526 
<212> PRT 

<213> Acremonium thermophilum 
<400> 2 

Met His Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Ser Ala Ala 
15 10 is 

Ala Gin Gin Ala Cys Thr Leu Thr Ala Glu Asn His Pro Thr Leu Ser 
20 25 30 

Trp Ser Lys Cys Thr Ser Gly Gly Ser Cys Thr Ser Val Ser Gly Ser 
35 40 45 

Val Thr He Asp Ala Asn Trp Arg Trp Thr His Gin Val Ser Ser Ser 
50 55 60 

Thr Asn Cys Tyr Thr Gly Asn Glu Trp Asp Thr Ser He Cys Thr Asp 
65 70 75 * 80 

Gly Ala Ser Cys Ala Ala Ala Cys Cys Leu Asp Gly Ala Asp Tyr Ser 
85 90 95 

Gly Thr Tyr Gly lie Thr Thr Ser Gly Asn Ala Leu Ser Leu Gin Phe 
100 105 no 
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Val Thr Gin Gly Pro Tyr Ser Thr Asn lie Gly Ser Arg Thr Tyr Leu 
115 120 125 

Met Ala Ser Asp Thr Lys Tyr Gin Met Phe Thr Leu Leu Gly Asn Glu 
130 135 140 

Phe Thr Phe Asp Val Asp Val Thr Gly Leu Gly Cys Gly Leu Asn Gly 
145 150 155 160 

Ala Leu Tyr Phe Val Ser Met Asp Glu Asp Gly Gly Leu Ser Lys Tyr 
165 170 175 

Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 
180 185 * 190 

Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Glu Ala Asn Asn Val 
195 200 205 

Gly Trp Thr Pro Ser Ser Asn Asp Lys Asn Ala Gly Leu Gly Asn Tyr 
210 215 220 

Gly Ser Cys Cys Ser Glu Met Asp Val Trp Glu Ala Asn Ser He Ser 
225 230 235 240 

Ala Ala Tyr Thr Pro His Pro Cys Thr Thr He Gly Gin Thr Arg Cys 
245 250 255 

Glu Gly Asp Asp Cys Gly Gly Thr Tyr Ser Thr Asp Arg Tyr Ala Gly 
260 265 270 

Glu Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asn 
275 280 285 

Thr Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Ser Lys Lys Phe 
290 295 300 

Thr Val Val Thr Gin Phe Leu Thr Asp Ser Ser Gly Asn Leu Ser Glu 
305 310 315 320 

He Lys Arg Phe Tyr Val Gin Asn Gly val Val He Pro Asn Ser Asn 
325 330 335 

Ser Asn lie Ala Gly Val Ser Gly Asn Ser lie Thr Gin Ala Phe Cys 
340 345 350 

Asp Ala Gin Lys Thr Ala Phe Gly Asp Thr Asn Val Phe Asp Gin Lys 
355 360 365 

Gly Gly Leu Ala Gin Met Gly Lys Ala Leu Ala Gin Pro Met Val Leu 
370 375 380 

Val Met Ser Leu Trp Asp Asp His Ala Val Asn Met Leu Trp Leu Asp 
385 390 395 400 

Ser Thr Tyr Pro Thr Asn Ala Ala Gly Lys Pro Gly Ala Ala Arg Gly 
405 410 415 

Thr Cys Pro Thr Thr Ser Gly Val Pro Ala Asp Val Glu Ser Gin Ala 
420 425 430 

Pro Asn Ser Lys Val He Tyr Ser Asn He Arg Phe Gly Pro He Gly 
435 440 ^ 445 
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Ser Thr Val Ser Gly Leu Pro Gly Gly Gly Ser Asn Pro Gly Gly Gly 
450 455 ' 460 

Ser Ser Ser Thr Thr Thr Thr Thr Arg Pro Ala Thr Ser Thr Thr Ser 
465 470 475 480 

Ser Ala Ser Ser Gly Pro Thr Gly Gly Gly Thr Ala Ala His Tip Gly 
485 490 495 

Gin Cys Gly Gly He Gly Trp Thr Gly Pro Thr Val Cys Ala Ser Pro 
500 505 510 

Tyr Thr Cys Gin Lys Leu Asn Asp Trp Tyr Tyr Gin Cys Leu 
515 520 * 525 



<210> 3 

<211> 1590 

<212> DNA 

<213> Chaetomium thermophilum 
<220> 

<221> CDS 

<222> (1)..(1590) 

<223> 

<400> 3 

atg atg tac aag aag ttc gcc get etc gee gee etc gtg get ggc gee 48 
Met Met Tyr Lys Lys Phe Ala Ala Leu Ala Ala Leu Val Ala Gly Ala 
15 10 15 

gcc gcc cag cag get tgc tec etc acc act gag ace cac ccc aga etc 96 
Ala Ala Gin Gin Ala Cys Ser Leu Thr Thr Glu Thr His Pro Arg Leu 
20 25 30 

act tgg aag cgc tgc acc tct ggc ggc aac tgc teg acc gtg aac ggc 144 
Thr Trp Lys Arg Cys Thr Ser Gly Gly Asn Cys Ser Thr Val Asn Gly 
35 40 45 

gcc gtc acc ate gat gcc aac tgg cgc tgg act cac acc gtt tec ggc 192 
Ala Val Thr He Asp Ala Asn Trp Arg Trp Thr His Thr Val Ser Gly 
50 55 60 

teg acc aac tgc tac acc ggc aac gag tgg gat acc tec ate tgc tct 240 
Ser Thr Asn Cys Tyr Thr Gly Asn Glu Trp Asp Thr Ser He Cys Ser 
65 70 75 * 80 

gat ggc aag age tgc gcc cag acc tgc tgc gtc gac ggc get gac tac 288 
Asp Gly Lys Ser Cys Ala Gin Thr Cys Cys Val Asp Gly Ala Asp Tyr 
85 90 95 

tct teg acc tat ggt ate acc acc age ggt gac tec ctg aac etc aag 336 
Ser Ser Thr Tyr Gly lie Thr Thr Ser Gly Asp Ser Leu Asn Leu Lys 
100 105 no 

ttc gtc acc aag cac cag tac ggc acc aat gtc ggc tct cgt gtc tac 384 
Phe Val Thr Lys His Gin Tyr Gly Thr Asn Val Gly Ser Arg Val Tyr 
115 120 125 

ctg atg gag aac gac acc aag tac cag atg ttc gag etc etc ggc aac 432 
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Leu Met Glu Asn Asp Thr Lys Tyr Gin Met Phe Glu Leu Leu Gly Asn 
130 135 140 

gag ttc acc ttc gat gtc gat gtc tct aac ctg ggc tgc ggt etc aac 480 
Glu Phe Thr Phe Asp Val Asp Val Ser Asn Leu Gly Cys Gly Leu Asn 
145 150 155 160 

ggt gec etc tac ttc gtc tec atg gac get gat ggt ggt atg age aag 528 
Gly Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Met Ser Lys 
165 170 175 

tac tct ggc aac aag get ggc gee aag tac ggg acg ggg tac tgt gat 576 
Tyr Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 
180 185 190 

get cag tgc ccg cgc gac ctt aag ttc ate aac ggc gag gee aac att 624 
Ala Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Glu Ala Asn He 
195 200 205 

gag aac tgg acc cct teg acc aat gat gee aac gee ggt ttc ggc cgc 672 
Glu Asn Trp Thr Pro Ser Thr Asn Asp Ala Asn Ala Gly Phe Gly Arg 
210 215 220 

tat ggc age tgc tgc tct gag atg gat ate tgg gag gee aac aac atg 720 
Tyr Gly Ser Cys Cys Ser Glu Met Asp He Trp Glu Ala Asn Asn Met 
225 230 235 240 

get act gee ttc act cct cac cct tgc acc att ate ggc cag age cgc 768 
Ala Thr Ala Phe Thr Pro His Pro Cys Thr He lie Gly Gin Ser Arg 
245 250 255 

tgc gag ggc aac age tgc ggt ggc acc tac age tct gag cgc tat get 816 
Cys Glu Gly Asn Ser Cys Gly Gly Thr Tyr Ser Ser Glu Arg Tyr Ala 
260 265 270 

ggt gtt tgc gat cct gat ggc tgc gac ttc aac gee tac cgc cag ggc 864 
Gly Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ala Tyr Arg Gin Gly 
275 280 285 

gac aag acc ttc tac ggc aag ggc atg acc gtc gac acc acc aag aag 912 
Asp Lys Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Thr Lys Lys 
290 295 300 

atg acc gtc gtc acc cag ttc cac aag aac teg get ggc gtc etc age 960 
Met Thr Val Val Thr Gin Phe His Lys Asn Ser Ala Gly Val Leu Ser 
305 310 315 * 320 

gag ate aag cgc ttc tac gtt cag gac ggc aag gtc att gee aac gee 1008 
Glu He Lys Arg Phe Tyr Val Gin Asp Gly Lys Val He Ala Asn Ala 
325 330 335 

gag tec aag ate ccc ggc aac ccc ggc aac tec ate acc cag gag tgg 1056 
Glu Ser Lys He Pro Gly Asn Pro Gly Asn Ser He Thr Gin Glu Trp 
340 345 350 

tgc gat gee cag aag gtc gee ttc ggt gac ate gat gac ttc aac cgc 1104 
Cys Asp Ala Gin Lys Val Ala Phe Gly Asp lie Asp Asp Phe Asn Arg 
355 360 365 

aag ggc ggt atg get cag atg age aag gee etc gaa ggc cct atg gtc 1152 
Lys Gly Gly Met Ala Gin Met Ser Lys Ala Leu Glu Gly Pro Met Val 
370 375 380 
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ctg gtc atg tec gtc tgg gat gac cac tac gec aac atg etc tgg etc 1200 
Leu Val Met Ser Val Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu 
385 390 395 400 

gac teg acc tac ccc ate gac aag gee ggc acc ccc ggc gee gag cgc 1248 
Asp Ser Thr Tyr Pro He Asp Lys Ala Gly Thr Pro Gly Ala Glu Arg 
405 410 ~ 415 

ggt get tgc ccg acc acc tec ggt gtc cct gee gag att gag gec cag 1296 
Gly Ala Cys Pro Thr Thr Ser Gly Val Pro Ala Glu He Glu Ala Gin 
420 425 430 

gtc ccc aac age aac gtc ate ttc tec aac ate cgc ttc ggc ccc ate 1344 
Val Pro Asn Ser Asn Val He Phe Ser Asn He Arg Phe Gly Pro He 
435 440 445 

ggc teg acc gtc cct ggc etc gac ggc age act ccc age aac ccg acc 1392 
Gly Ser Thr Val Pro Gly Leu Asp Gly Ser Thr Pro Ser Asn Pro Thr 
450 455 460 

gee acc gtt get cct ccc act tct acc acc age gtg aga age age act 1440 
Ala Thr Val Ala Pro Pro Thr Ser Thr Thr Ser Val Arg Ser Ser Thr 
465 470 475 480 

act cag att tec acc ccg act age cag ccc ggc ggc tgc acc acc cag 1488 
Thr Gin He Ser Thr Pro Thr Ser Gin Pro Gly Gly Cys Thr Thr Gin 
485 490 495 

aag tgg ggc cag tgc ggt ggt ate ggc tac acc ggc tgc act aac tgc 1536 
Lys Trp Gly Gin Cys Gly Gly He Gly Tyr Thr Gly Cys Thr Asn Cys 
500 505 510 

gtt get ggc act acc tgc act gag etc aac ccc tgg tac age cag tgc 1584 
Val Ala Gly Thr Thr Cys Thr Glu Leu Asn Pro Trp Tyr Ser Gin Cys 
515 520 525 

ctg taa 1590 
Leu 



<210> 4 

<211> 529 

<212> PRT 

<213> Chaetomium thermophilum 

<400> 4 



Met Met Tyr Lys Lys Phe Ala Ala Leu Ala Ala Leu Val Ala Gly Ala 
15 10 15 

Ala Ala Gin Gin Ala Cys Ser Leu Thr Thr Glu Thr His Pro Arg Leu 
20 25 30 

Thr Trp Lys Arg Cys Thr Ser Gly Gly Asn Cys Ser Thr Val Asn Gly 
35 40 45 

Ala Val Thr He Asp Ala Asn Trp Arg Trp Thr His Thr Val Ser Gly 
50 55 60 

Ser Thr Asn Cys Tyr Thr Gly Asn Glu Trp Asp Thr Ser He Cys Ser 
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65 70 75 80 

Asp Gly Lys Ser Cys Ala Gin Thr Cys Cys Val Asp Gly Ala Asp Tyr 
85 90 95 

Ser Ser Thr Tyr Gly He Thr Thr Ser Gly Asp Ser Leu Asn Leu Lys 
100 105 no 

Phe Val Thr Lys His Gin Tyr Gly Thr Asn Val Gly Ser Arg Val Tyr 
115 120 125 

Leu Met Glu Asn Asp Thr Lys Tyr Gin Met Phe Glu Leu Leu Gly Asn 
130 135 140 

Glu Phe Thr Phe Asp Val Asp Val Ser Asn Leu Gly Cys Gly Leu Asn 
145 150 155 4 160 

Gly Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Met Ser Lys 
165 170 * " 175 

Tyr Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 
180 185 190 

Ala Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Glu Ala Asn He 
195 200 205 

Glu Asn Trp Thr Pro Ser Thr Asn Asp Ala Asn Ala Gly Phe Gly Arg 
210 215 220 

Tyr Gly Ser Cys Cys Ser Glu Met Asp He Trp Glu Ala Asn Asn Met 
225 230 235 240 

Ala Thr Ala Phe Thr Pro His Pro Cys Thr He He Gly Gin Ser Arg 
245 250 ' 255 

Cys Glu Gly Asn Ser Cys Gly Gly Thr Tyr Ser Ser Glu Arg Tyr Ala 
260 265 270 

Gly Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ala Tyr Arg Gin Gly 
275 280 285 

Asp Lys Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Thr Lys Lys 
290 295 300 

Met Thr Val Val Thr Gin Phe His Lys Asn Ser Ala Gly Val Leu Ser 
305 310 315 * 320 

Glu He Lys Arg Phe Tyr Val Gin Asp Gly Lys Val He Ala Asn Ala 
325 330 335 

Glu Ser Lys lie Pro Gly Asn Pro Gly Asn Ser He Thr Gin Glu Trp 
340 345 350 

Cys Asp Ala Gin Lys Val Ala Phe Gly Asp He Asp Asp Phe Asn Arg 
355 360 365 

Lys Gly Gly Met Ala Gin Met Ser Lys Ala Leu Glu Gly Pro Met Val 
370 375 380 

Leu Val Met Ser Val Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu 
385 390 395 400 



8 



WO 03/000941 PCT/DK02/00429 

Asp Ser Thr Tyr Pro He Asp Lys Ala Gly Thr Pro Gly Ala Glu Arg 
405 410 415 

Gly Ala Cys Pro Thr Thr Ser Gly Val Pro Ala Glu He Glu Ala Gin 
420 425 430 

Val Pro Asn Ser Asn Val He Phe Ser Asn lie Arg Phe Gly Pro He 
435 440 445 

Gly Ser Thr Val Pro Gly Leu Asp Gly Ser Thr Pro Ser Asn Pro Thr 
450 455 460 

Ala Thr Val Ala Pro Pro Thr Ser Thr Thr Ser Val Arg Ser Ser Thr 
465 470 475 480 

Thr Gin He Ser Thr Pro Thr Ser Gin Pro Gly Gly Cys Thr Thr Gin 
485 490 495 

Lys Trp Gly Gin Cys Gly Gly lie Gly Tyr Thr Gly Cys Thr Asn Cys 
500 505 510 

Val Ala Gly Thr Thr Cys Thr Glu Leu Asn Pro Trp Tyr Ser Gin Cys 
515 520 ' 525 

Leu 



<210> 5 

<211> 1356 

<212> DNA 

<213> Scytalidium sp. 
<220> 

<221> CDS 

<222> (1)..(1356) 

<223> 

<400> 5 

atg cag ate aag age tac ate cag tac ctg gec gcg get ctg ccg etc 48 
Met Gin He Lys Ser Tyr He Gin Tyr Leu Ala Ala Ala Leu Pro Leu 
15 10 15 

ctg age age gtc get gee cag cag gee ggc acc ate ace gee gag aac 96 
Leu Ser Ser Val Ala Ala Gin Gin Ala Gly Thr He Thr Ala Glu Asn 
20 25 30 

cac ccc agg atg acc tgg aag agg tgc teg ggc ccc ggc aac tgc cag 144 
His Pro Arg Met Thr Trp Lys Arg Cys Ser Gly Pro Gly Asn Cys Gin 
35 40 45 

acc gtg cag ggc gag gtc gtc ate gac gee aac tgg cgc tgg ctg cac 192 
Thr Val Gin Gly Glu Val Val He Asp Ala Asn Trp Arg Trp Leu His 
50 55 60 



aac aac ggc cag aac tgc tat gag ggc aac aag tgg acc age cag tgc 240 
Asn Asn Gly Gin Asn Cys Tyr Glu Gly Asn Lys Trp Thr Ser Gin Cys 
65 70 75 ~ 80 

age teg gee acc gac tgc gcg cag agg tgc gee etc gac ggt gee aac 288 
Ser Ser Ala Thr Asp Cys Ala Gin Arg Cys Ala Leu Asp Gly Ala Asn 
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85 90 95 

tac cag teg acc tac ggc gec teg acc age ggc gac tec ctg acg etc 336 
Tyr Gin Ser Thr Tyr Gly Ala Ser Thr Ser Gly Asp Ser Leu Thr Leu 
100 105 HO 

aag ttc gtc acc aag cac gag tac ggc acc aac ate ggc teg cgc ttc 384 
Lys Phe Val Thr Lys His Glu Tyr Gly Thr Asn lie Gly Ser Arg Phe 
115 120 125 

tac etc atg gee aac cag aac aag tac cag atg ttc acc ctg atg aac 432 
Tyr Leu Met Ala Asn Gin Asn Lys Tyr Gin Met Phe Thr Leu Met Asn 
130 135 140 

aac gag ttc gec ttc gat gtc gac etc tec aag gtt gag tgc ggt ate 48 0 

Asn Glu Phe Ala Phe Asp Val Asp Leu Ser Lys Val Glu Cys Gly lie 
145 150 155 160 

aac age get ctg tac ttc gtc gec atg gag gag gat ggt ggc atg gee 528 
Asn Ser Ala Leu Tyr Phe Val Ala Met Glu Glu Asp Gly Gly Met Ala 
165 170 * 175 

age tac ccg age aac cgt get ggt gec aag tac ggc acg ggc tac tgc 576 
Ser Tyr Pro Ser Asn Arg Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys 
180 185 * 190 

gat gec caa tgc gee cgt gac etc aag ttc att ggc ggc aag gee aac 624 
Asp Ala Gin Cys Ala Arg Asp Leu Lys Phe He Gly Gly Lys Ala Asn 
195 200 205 

att gag ggc tgg cgc ccg tec acc aac gac ccc aac gec ggt gtc ggt 672 
He Glu Gly Trp Arg Pro Ser Thr Asn Asp Pro Asn Ala Gly Val Gly 
210 215 220 

ccc atg ggt gee tgc tgc get gag ate gac gtt tgg gag tec aac gee 720 
Pro Met Gly Ala Cys Cys Ala Glu lie Asp Val Trp Glu Ser Asn Ala 
225 230 235 240 

tat get tat gec ttc acc ccc cac gee tgc ggc age aag aac cgc tac 768 
Tyr Ala Tyr Ala Phe Thr Pro His Ala Cys Gly Ser Lys Asn Arg Tyr 
245 250 255 

cac ate tgc gag acc aac aac tgc ggt ggt acc tac teg gat gac cgc 816 
His He Cys Glu Thr Asn Asn Cys Gly Gly Thr Tyr Ser Asp Asp Arg 
260 265 * 270 

ttc gee ggc tac tgc gac gee aac ggc tgc gac tac aac ccc tac cgc 864 
Phe Ala Gly Tyr Cys Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr Arg 
275 280 285 

atg ggc aac aag gac ttc tat ggc aag ggc aag acc gtc gac acc aac 912 
Met Gly Asn Lys Asp Phe Tyr Gly Lys Gly Lys Thr Val Asp Thr Asn 
290 295 300 

cgc aag ttc acc gtt gtc tec cgc ttc gag cgt aac agg etc tct cag 960 
Arg Lye Phe Thr Val Val Ser Arg Phe Glu Arg Asn Arg Leu Ser Gin 
305 310 315 320 

ttc ttc gtc cag gac ggc cgc aag ate gag gtg ccc cct ccg acc tgg 1008 
Phe Phe Val Gin Asp Gly Arg Lys He Glu Val Pro Pro Pro Thr Trp 
325 330 335 



10 



WO 03/000941 PCT/DK02/00429 

ccc ggc etc ccg aac age gee gae ate ace cct gag etc tgc gat get 1056 
Pro Gly Leu Pro Asn Ser Ala Asp lie Thr Pro Glu Leu Cys Asp Ala 
340 345 350 

cag ttc cgc gtc ttc gat gae cgc aac cgc ttc gee gag ace ggt ggc 1104 
Gin Phe Arg Val Phe Asp Asp Arg Asn Arg Phe Ala Glu Thr Gly Gly 
355 360 365 

ttc gat get ctg aac gag gee etc ace att ccc atg gtc ctt gtc atg 1152 
Phe Asp Ala Leu Asn Glu Ala Leu Thr lie Pro Met Val Leu Val Met 
370 375 380 

tec ate tgg gat gae cac cac tec aac atg etc tgg etc gae tec age 1200 
Ser lie Trp Asp Asp His His Ser Asn Met Leu Trp Leu Asp Ser Ser 
385 390 395 400 

tac ccg ccc gag aag gec ggc etc ccc ggt ggc gae cgt ggc ccg tgc 124 8 
Tyr Pro Pro Glu Lys Ala Gly Leu Pro Gly Gly Asp Arg Gly Pro Cys 
405 410 * ~ 415 

ccg acc acc tct ggt gtc cct gec gag gtc gag get cag tac ccc gat 1296 
Pro Thr Thr Ser Gly Val Pro Ala Glu Val Glu Ala Gin Tyr Pro Asp 
420 425 430 

get cag gtc gtc tgg tec aac ate cgc ttc ggc ccc ate ggc teg acc 1344 
Ala Gin Val Val Trp Ser Asn He Arg Phe Gly Pro He Gly Ser Thr 
435 440 445 

gtc aac gtc taa 13 56 
Val Asn Val 
450 



<210> 6 

<211> 451 

<212> PRT 

<213> Scytalidium sp. 

<400> 6 

Met Gin He Lys Ser Tyr He Gin Tyr Leu Ala Ala Ala Leu Pro Leu 
1 5 10 15 

Leu Ser Ser Val Ala Ala Gin Gin Ala Gly Thr He Thr Ala Glu Asn 
20 25 30 

His Pro Arg Met Thr Trp Lys Arg Cys Ser Gly Pro Gly Asn Cys Gin 
35 40 45 

Thr Val Gin Gly Glu Val Val He Asp Ala Asn Trp Arg Trp Leu His 
50 55 60 

Asn Asn Gly Gin Asn Cys Tyr Glu Gly Asn Lys Trp Thr Ser Gin Cys 
65 70 75 80 

Ser Ser Ala Thr Asp Cys Ala Gin Arg Cys Ala Leu Asp Gly Ala Asn 
85 90 ^ 95 

Tyr Gin Ser Thr Tyr Gly Ala Ser Thr Ser Gly Asp Ser Leu Thr Leu 
100 105 110 

Lys Phe Val Thr Lys His Glu Tyr Gly Thr Asn He Gly Ser Arg Phe 

11 
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115 



120 



125 



Tyr Leu Met Ala Asn Gin Asn Lys Tyr Gin Met Phe Thr Leu Met Asn 
130 135 140 

Asn Glu Phe Ala Phe Asp Val Asp Leu Ser Lys Val Glu Cys Gly He 
145 150 155 160 

Asn Ser Ala Leu Tyr Phe Val Ala Met Glu Glu Asp Gly Gly Met Ala 
165 170 175 

Ser Tyr Pro Ser Asn Arg Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys 
180 185 * 190 

Asp Ala Gin Cys Ala Arg Asp Leu Lys Phe He Gly Gly Lys Ala Asn 
195 200 205 

He Glu Gly Trp Arg Pro Ser Thr Asn Asp Pro Asn Ala Gly Val Gly 
210 215 220 

Pro Met Gly Ala Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Ala 
225 230 235 240 

Tyr Ala Tyr Ala Phe Thr Pro His Ala Cys Gly Ser Lys Asn Arg Tyr 
245 250 255 

His He Cys Glu Thr Asn Asn Cys Gly Gly Thr Tyr Ser Asp Asp Arg 
260 265 270 

Phe Ala Gly Tyr Cys Asp Ala Asn Gly Cys Asp Tyr Ash Pro Tyr Arg 
275 280 285 

Met Gly Asn Lys Asp Phe Tyr Gly Lys Gly Lys Thr Val Asp Thr Asn 
290 295 300 

Arg Lys Phe Thr Val Val Ser Arg Phe Glu Arg Asn Arg Leu Ser Gin 
305 310 315 320 

Phe Phe Val Gin Asp Gly Arg Lys He Glu Val Pro Pro Pro Thr Trp 
325 330 335 

Pro Gly Leu Pro Asn Ser Ala Asp He Thr Pro Glu Leu Cys Asp Ala 
340 345 350 

Gin Phe Arg Val Phe Asp Asp Arg Asn Arg Phe Ala Glu Thr Gly Gly 
355 360 ~ 365 

Phe Asp Ala Leu Asn Glu Ala Leu Thr He Pro Met Val Leu Val Met 
370 375 380 

Ser He Trp Asp Asp His His Ser Asn Met Leu Trp Leu Asp Ser Ser 
385 390 395 400 

Tyr Pro Pro Glu Lys Ala Gly Leu Pro Gly Gly Asp Arg Gly Pro Cys 
405 410 415 

Pro Thr Thr Ser Gly Val Pro Ala Glu Val Glu Ala Gin Tyr Pro Asp 
420 425 430 

Ala Gin Val Val Trp Ser Asn He Arg Phe Gly Pro He Gly Ser Thr 



435 



440 



445 
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Val Asn Val 
450 



<210> 7 

<211> 1374 

<212> DNA 

<213> Thermoascus aurantiacus 
<220> 

<221> CDS 

<222> (1)..(1374) 

<223> 

<400> 7 

atg tat cag cgc get ctt etc ttc tct ttc ttc etc tec gee gee cgc 48 

Met Tyr Gin Arg Ala Leu Leu Phe Ser Phe Phe Leu Ser Ala Ala Arg 
15 10 15 

gcg cag cag gec ggt acc eta acc gca gag aat cac cct tec ctg acc 96 
Ala Gin Gin Ala Gly Thr Leu Thr Ala Glu Asn His Pro Ser Leu Thr 
20 25 30 

tgg cag caa tgc tec age ggc ggt agt tgt acc acg cag aat gga aaa 144 
Trp Gin Gin Cys Ser Ser Gly Gly Ser Cys Thr Thr Gin Asn Gly Lys 
35 40 45 

gtc gtt ate gat gcg aac tgg cgt tgg gtc cat acc acc tct gga tac 192 
Val Val He Asp Ala Asn Trp Arg Trp Val His Thr Thr Ser Gly Tyr 
50 55 60 

acc aac tgc tac acg ggc aat acg tgg gac acc agt ate tgt ccc gac 240 
Thr Asn Cys Tyr Thr Gly Asn Thr Trp Asp Thr Ser lie Cys Pro Asp 
65 70 75 80 

gac gtg acc tgc get cag aat tgt gee ttg gat gga gcg gat tac agt 288 
Asp Val Thr Cys Ala Gin Asn Cys Ala Leu Asp Gly Ala Asp Tyr Ser 
85 90 * 95 

ggc acc tat ggt gtt acg acc agt ggc aac gec ctg aga ctg aac ttt 336 
Gly Thr Tyr Gly Val Thr Thr Ser Gly Asn Ala Leu Arg Leu Asn Phe 
100 105 110 

gtc acc caa age tea ggg aag aac att ggc teg cgc ctg tac ctg ctg 384 
Val Thr Gin Ser Ser Gly Lys Asn He Gly Ser Arg Leu Tyr Leu Leu 
115 120 * 125 

cag gac gac acc act tat cag ate ttc aag ctg ctg ggt cag gag ttt 432 
Gin Asp Asp Thr Thr Tyr Gin He Phe Lys Leu Leu Gly Gin Glu Phe 
130 135 140 

acc ttc gat gtc gac gtc tec aat etc cct tgc ggg ctg aac ggc gec 480 
Thr Phe Asp Val Asp Val Ser Asn Leu Pro Cys Gly Leu Asn Gly Ala 
145 150 155 160 

etc tac ttt gtg gee atg gac gee gac ggc gga ttg tec aaa tac cct 528 
Leu Tyr Phe Val Ala Met Asp Ala Asp Gly Gly Leu Ser Lys Tyr Pro 
165 170 175 

ggc aac aag gca ggc get aag tat ggc act ggt tac tgc gac tct cag 576 
Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser Gin 
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180 185 190 

tgc cct egg gat etc aag ttc ate aac ggt cag gec aac gtt gaa ggc 624 
Cys Pro Arg Asp Leu Lys Phe He Asn Gly Gin Ala Asn Val Glu Gly 
195 200 205 

tgg cag ccg tct gec aac gac cca aat gec ggc gtt ggt aac cac ggt 672 
Trp Gin Pro Ser Ala Asn Asp Pro Asn Ala Gly Val Gly Asn His Gly 
210 215 220 

tec tgc tgc get gag atg gat gtc tgg gaa gec aac age ate tct act 720 
Ser Cys Cys Ala Glu Met Asp Val Trp Glu Ala Asn Ser He Ser Thr 
225 230 235 240 

gcg gtg acg cct cac cca tgc gac ace ccc ggc cag ace atg tgc cag 768 
Ala Val Thr Pro His Pro Cys Asp Thr Pro Gly Gin Thr Met Cys Gin 
245 250 255 

gga gac gac tgt ggt gga acc tac tec tec act cga tat get ggt ace 816 
Gly Asp Asp Cys Gly Gly Thr Tyr Ser Ser Thr Arg Tyr Ala Gly Thr 
260 265 270 

tgc gac cct gat ggc tgc gac ttc aat cct tac cgc cag ggc aac cac 864 
Cys Asp Pro Asp Gly Cys Asp Phe Asn Pro Tyr Arg Gin Gly Asn His 
275 280 285 

teg ttc tac ggc ccc ggg aag ate gtc gac act age tec aaa ttc acc 912 
Ser Phe Tyr Gly Pro Gly Lys He Val Asp Thr Ser Ser Lys Phe Thr 
290 295 300 

gtc gtc acc cag ttc ate acc gac gac ggg acc ccc tec ggc acc ctg 960 
Val Val Thr Gin Phe He Thr Asp Asp Gly Thr Pro Ser Gly Thr Leu 
305 310 315 320 

acg gag ate aaa cgc ttc tac gtc cag aac ggc aag gtg ate ccc cag 1008 
Thr Glu He Lys Arg Phe Tyr Val Gin Asn Gly Lys Val He Pro Gin 
325 330 * 335 

teg gag teg acg ate age ggc gtc acc ggc aac tea ate acc acc gag 1056 
Ser Glu Ser Thr He Ser Gly Val Thr Gly Asn Ser lie Thr Thr Glu 
340 345 350 

tat tgc acg gee cag aag gee gee ttc ggc gac aac acc ggc ttc ttc 1104 
Tyr Cys Thr Ala Gin Lys Ala Ala Phe Gly Asp Asn Thr Gly Phe Phe 
355 360 365 

acg cac ggc ggg ctt cag aag ate agt cag get ctg get cag ggc atg 1152 
Thr His Gly Gly Leu Gin Lys He Ser Gin Ala Leu Ala Gin Gly Met 
370 375 380 

gtc etc gtc atg age ctg tgg gac gat cac gee gec aac atg etc tgg 1200 
Val Leu Val Met Ser Leu Trp Asp Asp His Ala Ala Asn Met Leu Trp 
385 390 395 400 

ctg gac age acc tac ccg act gat gcg gac ccg gac acc cct ggc gtc 1248 
Leu Asp Ser Thr Tyr Pro Thr Asp Ala Asp Pro Asp Thr Pro Gly Val 
405 410 415 

gcg cgc ggt acc tgc ccc acg acc tec ggc gtc ccg gee gac gtt gag 1296 
Ala Arg Gly Thr Cys Pro Thr Thr Ser Gly Val Pro Ala Asp Val Glu 
420 425 430 
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teg cag aac ccc aat tea tat gtt ate tac tec aac ate aag gtc gga 1344 
Ser Gin Asn Pro Asn Ser Tyr Val He Tyr Ser Asn He Lys Val Gly 
435 440 445 

ccc ate aac teg ace ttc ace gee aac taa 1374 
Pro He Asn Ser Thr Phe Thr Ala Asn 
450 455 



<210> 8 

<211> 457 

<212> PRT 

<213> Thermoascus aurantiacus 

<400> 8 

Met Tyr Gin Arg Ala Leu Leu Phe Ser Phe Phe Leu Ser Ala Ala Arg 
1 5 io 15 

Ala Gin Gin Ala Gly Thr Leu Thr Ala Glu Asn His Pro Ser Leu Thr 
20 25 30 

Trp Gin Gin Cys Ser Ser Gly Gly Ser Cys Thr Thr Gin Asn Gly Lys 
35 40 45 

val Val He Asp Ala Asn Trp Arg Trp Val His Thr Thr Ser Gly Tyr 
50 55 60 

Thr Asn Cys Tyr Thr Gly Asn Thr Trp Asp Thr Ser He Cys Pro Asp 
65 70 75 80 

Asp Val Thr Cys Ala Gin Asn Cys Ala Leu Asp Gly Ala Asp Tyr Ser 
85 90 * 95 

Gly Thr Tyr Gly Val Thr Thr Ser Gly Asn Ala Leu Arg Leu Asn Phe 
100 105 110 

Val Thr Gin Ser Ser Gly Lys Asn He Gly Ser Arg Leu Tyr Leu Leu 
115 120 125 

Gin Asp Asp Thr Thr Tyr Gin He Phe Lys Leu Leu Gly Gin Glu Phe 
130 135 140 

Thr Phe Asp Val Asp Val Ser Asn Leu Pro Cys Gly Leu Asn Gly Ala 
145 150 155 160 

Leu Tyr Phe Val Ala Met Asp Ala Asp Gly Gly Leu Ser Lys Tyr Pro 
165 170 175 

Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser Gin 
180 185 190 

Cys Pro Arg Asp Leu Lys Phe He Asn Gly Gin Ala Asn val Glu Gly 
195 200 205 

Trp Gin Pro Ser Ala Asn Asp Pro Asn Ala Gly Val Gly Asn His Gly 
210 215 220 

Ser Cys Cys Ala Glu Met Asp Val Trp Glu Ala Asn Ser He Ser Thr 
225 230 235 240 

Ala Val Thr Pro His Pro Cys Asp Thr Pro Gly Gin Thr Met Cys Gin 
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245 250 255 

Gly Asp Asp Cys Gly Gly Thr Tyr Ser Ser Thr Arg Tyr Ala Gly Thr 
260 265 270 

Cys Asp Pro Asp Gly Cys Asp Phe Asn Pro Tyr Arg Gin Gly Asn His 
275 280 285 

Ser Phe Tyr Gly Pro Gly Lys He Val Asp Thr Ser Ser Lys Phe Thr 
290 295 300 

Val Val Thr Gin Phe He Thr Asp Asp Gly Thr Pro Ser Gly Thr Leu 
305 310 315 320 

Thr Glu He Lys Arg Phe Tyr Val Gin Asn Gly Lys Val He Pro Gin 
325 330 335 

Ser Glu Ser Thr He Ser Gly Val Thr Gly Asn Ser He Thr Thr Glu 
340 345 350 

Tyr Cys Thr Ala Gin Lys Ala Ala Phe Gly Asp Asn Thr Gly Phe Phe 
355 360 365 

Thr His Gly Gly Leu Gin Lys He Ser Gin Ala Leu Ala Gin Gly Met 
370 375 380 

Val Leu Val Met Ser Leu Trp Asp Asp His Ala Ala Asn Met Leu Trp 
385 390 395 400 

Leu Asp Ser Thr Tyr Pro Thr Asp Ala Asp Pro Asp Thr Pro Gly Val 
405 410 415 

Ala Arg Gly Thr Cys Pro Thr Thr Ser Gly Val Pro Ala Asp Val Glu 
420 425 430 

Ser Gin Asn Pro Asn Ser Tyr Val He Tyr Ser Asn He Lys Val Gly 
435 440 445 

Pro He Asn Ser Thr Phe Thr Ala Asn 
450 455 



<210> 9 

<211> 1617 

<212> DNA 

<213> Thielavia australiensis 
<220> 

<221> CDS 

<222> (1)..(1617) 

<223> 

<400> 9 

atg tat gcc aag ttc gcg acc etc gec gec etc gtg get ggc gec tec 48 
Met Tyr Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Gly Ala Ser 
15 10 15 

gcc cag gcc gtc tgc age ctt acc get gag acg cac cct tec ctg acg 96 
Ala Gin Ala Val Cys Ser Leu Thr Ala Glu Thr His Pro Ser Leu Thr 
20 25 30 
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tgg cag aag tgc acg gcc ccc ggc age tgc acc aac gtc gec ggc tec 144 
Trp Gin Lys Cys Thr Ala Pro Gly Ser Cys Thr Asn Val Ala Gly Ser 
35 40 45 

ate acc ate gac gcc aac tgg cgc tgg act cac cag acc teg tec gcg 192 
He Thr He Asp Ala Asn Trp Arg Trp Thr His Gin Thr Ser Ser Ala 
50 55 60 

acc aac tgc tac age ggc age aag tgg gac teg tec at.c tgc acg acc 240 
Thr Asn Cys Tyr Ser Gly Ser Lys Trp Asp Ser Ser He Cys Thr Thr 
65 70 75 80 

ggc acc gac tgc gcc tec aag tgc tgc att gat ggc gcc gag tac teg 288 
Gly Thr Asp Cys Ala Ser Lys Cys Cys He Asp Gly Ala Glu Tyr Ser 
85 90 95 

age acc tac ggc ate acc acc age ggc aat gcc ctg aac etc aag ttc 336 
Ser Thr Tyr Gly He Thr Thr Ser Gly Asn Ala Leu Asn Leu Lys Phe 
100 105 no 

gtc acc aag ggc cag tac teg acc aac att ggc teg cgt acc tac etc 384 
Val Thr Lys Gly Gin Tyr Ser Thr Asn He Gly Ser Arg Thr Tyr Leu 
115 120 125 

atg gag teg gac acc aag tac cag atg ttc aag etc ctt ggc aac gag 432 
Met Glu Ser Asp Thr Lys Tyr Gin Met Phe Lys Leu Leu Gly Asn Glu 
130 135 * 140 

ttc acc ttc gac gtc gat gtc tec aac etc ggc tgc ggc etc aac ggc 480 
Phe Thr Phe Asp Val Asp Val Ser Asn Leu Gly Cys Gly Leu Asn Gly 
145 150 155 160 

gcc ctg tac ttc gtc tec atg gat gcc gac ggt ggc atg tec aag tac 528 
Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Met Ser Lys Tyr 
165 170 175 

teg ggc aac aag gcc ggt gcc aag tac ggt acc ggc tac tgc gat get 576 
Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ala 
180 185 190 

cag tgc ccc cgc gac etc aag ttc ate aac ggc gag gcc aac gtt gag 624 
Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Glu Ala Asn Val Glu 
195 200 205 

ggc tgg gag age teg acc aac gac gcc aac gcc ggc teg ggc aag tac 672 
Gly Trp Glu Ser Ser Thr Asn Asp Ala Asn Ala Gly Ser Gly Lys Tyr 
210 215 220 

ggc age tgc tgc acc gag atg gac gtc tgg gag gcc aac aac atg gcg 720 
Gly Ser Cys Cys Thr Glu Met Asp Val Trp Glu Ala Asn Asn Met Ala 
225 230 235 240 

act gcc ttc act cct cac cct tgc acc acc att ggc cag act cgc tgc 768 
Thr Ala Phe Thr Pro His Pro Cys Thr Thr He Gly Gin Thr Arg Cys 
245 250 255 

gag ggc gac acc tgc ggc ggc acc tac age tea gac cgc tac gcc ggc 816 
Glu Gly Asp Thr Cys Gly Gly Thr Tyr Ser Ser Asp Arg Tyr Ala Gly 
260 265 270 

gtc tgc gac ccc gac gga tgc gac ttc aac teg tac cgc cag ggc aac 864 
Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Gin Gly Asn 
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aag acc ttc tac ggc aag ggc atg acc gtc gac acc acc aag aag ate 912 
Lys Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Thr Lys Lys lie 
290 295 300 

acg gtc gtc acc cag ttc etc aag aac teg gee ggc gag etc tec gag 960 
Thr Val Val Thr Gin Phe Leu Lys Asn Ser Ala Gly Glu Leu Ser Glu 
305 310 315 320 

ate aag cgc ttc tac gee cag gac ggc aag gtc ate ccg aac agt gag 1008 
lie Lys Arg Phe Tyr Ala Gin Asp Gly Lys Val He Pro Asn Ser Glu 
325 330 335 

tct acc att gec ggc ate ccc ggc aac tec ate acc aag gec tac tgc 1056 
Ser Thr He Ala Gly He Pro Gly Asn Ser He Thr Lys Ala Tyr Cys 
340 345 350 

gac gee cag aag acc gtc ttc cag aac acc gac gac ttc acc gec aag 1104 
Asp Ala Gin Lys Thr Val Phe Gin Asn Thr Asp Asp Phe Thr Ala Lys 
355 360 365 

ggc ggc etc gtc cag atg ggc aag gec etc gec ggc gac atg gtc etc 1152 
Gly Gly Leu Val Gin Met Gly Lys Ala Leu Ala Gly Asp Met Val Leu 
370 375 380 

gtc atg tec gtc tgg gac gac cac gec gtc aac atg etc tgg eta gac 1200 
Val Met Ser Val Trp Asp Asp His Ala Val Asn Met Leu Trp Leu Asp 
385 390 395 *" 400 

teg acc tac ccg acc gac cag gtc ggc gtt gee ggc get gag cgc ggc 1248 
Ser Thr Tyr Pro Thr Asp Gin Val Gly Val Ala Gly Ala Glu Arg Gly 
405 410 415 

gee tgc ccc acc acc teg ggc gtc ccc teg gat gtt gag gee aac gec 1296 
Ala Cys Pro Thr Thr Ser Gly Val Pro Ser Asp Val Glu Ala Asn Ala 
420 425 430 

ccc aac tec aac gtc ate ttc tec aac ate cgc ttc ggc ccc ate ggc 1344 
Pro Asn Ser Asn Val He Phe Ser Asn He Arg Phe Gly Pro He Gly 
435 440 445 

tec acc gtc cag ggc ctg ccc age tec ggc ggc acc tec age age teg 1392 
Ser Thr Val Gin Gly Leu Pro Ser Ser Gly Gly Thr Ser Ser Ser Ser 
450 455 460 

age gec get ccc cag teg acc age acc aag gee teg acc acc acc tea 1440 
Ser Ala Ala Pro Gin Ser Thr Ser Thr Lys Ala Ser Thr Thr Thr Ser 
465 470 475 480 

get gtc cgc acc acc teg act gee acc acc aag acc acc tec teg get 1488 
Ala Val Arg Thr Thr Ser Thr Ala Thr Thr Lys Thr Thr Ser Ser Ala 
485 490 495 

ccc gee cag ggc acc aac act gec aag cat tgg cag caa tgc ggt ggt 1536 
Pro Ala Gin Gly Thr Asn Thr Ala Lys His Trp Gin Gin Cys Gly Gly 
500 505 510 

aac ggc tgg acc ggc ccg acg gtg tgc gag tct ccc tac aag tgc acc 1584 
Asn Gly Trp Thr Gly Pro Thr Val Cys Glu Ser Pro Tyr Lys Cys Thr 
515 520 525 
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aag cag aac gac tgg tac teg cag tgc etc taa 
Lys Gin Asn Asp Trp Tyr Ser Gin Cys Leu 
530 535 



PCT/DK02/00429 
1617 



<210> 10 

<211> 538 

<212> PRT 

<213> Thielavia australiensis 

<400> 10 



Met Tyr Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Gly Ala Ser 
15 10 15 

Ala Gin Ala Val Cys Ser Leu Thr Ala Glu Thr His Pro Ser Leu Thr 
20 25 30 

Trp Gin Lys Cys Thr Ala Pro Gly Ser Cys Thr Asn Val Ala Gly Ser 
35 40 45 

lie Thr He Asp Ala Asn Trp Arg Trp Thr His Gin Thr Ser Ser Ala 
50 55 60 

Thr Asn Cys Tyr Ser Gly Ser Lys Trp Asp Ser Ser lie Cys Thr Thr 
65 70 75 * 80 

Gly Thr Asp Cys Ala Ser Lys Cys Cys He Asp Gly Ala Glu Tyr Ser 
85 90 ' 95 

Ser Thr Tyr Gly He Thr Thr Ser Gly Asn Ala Leu Asn Leu Lys Phe 
100 105 HO 

Val Thr Lys Gly Gin Tyr Ser Thr Asn He Gly Ser Arg Thr Tyr Leu 
115 120 125 

Met Glu Ser Asp Thr Lys Tyr Gin Met Phe Lys Leu Leu Gly Asn Glu 
130 135 140 

Phe Thr Phe Asp Val Asp Val Ser Asn Leu Gly Cys Gly Leu Asn Gly 
145 150 155 160 

Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly. Gly Met Ser Lys Tyr 
165 170 175 

Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ala 
180 185 * 190 

Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Glu Ala Asn Val Glu 
195 200 205 

Gly Trp Glu Ser Ser Thr Asn Asp Ala Asn Ala Gly Ser Gly Lys Tyr 
210 215 220 

Gly Ser Cys Cys Thr Glu Met Asp Val Trp Glu Ala Asn Asn Met Ala 
225 230 235 240 

Thr Ala Phe Thr Pro His Pro Cys Thr Thr He Gly Gin Thr Arg Cys 
245 250 255 



Glu Gly Asp Thr Cys Gly Gly Thr Tyr Ser Ser Asp Arg Tyr Ala Gly 
260 265 270 
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Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Gin Gly Asn 
275 280 285 

Lys Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Thr Lys Lys He 
290 295 300 

Thr Val Val Thr Gin Phe Leu Lys Asn Ser Ala Gly Glu Leu Ser Glu 
305 310 315 320 

lie Lys Arg Phe Tyr Ala Gin Asp Gly Lys Val He Pro Asn Ser Glu 
325 330 335 

Ser Thr He Ala Gly He Pro Gly Asn Ser He Thr Lys Ala Tyr Cys 
340 345 350 

Asp Ala Gin Lys Thr Val Phe Gin Asn Thr Asp Asp Phe Thr Ala Lys 
355 360 * 365 

Gly Gly Leu Val Gin Met Gly Lys Ala Leu Ala Gly Asp Met Val Leu 
370 375 380 

Val Met Ser Val Trp Asp Asp His Ala Val Asn Met Leu Trp Leu Asp 
385 390 395 400 

Ser Thr Tyr Pro Thr Asp Gin Val Gly Val Ala Gly Ala Glu Arg Gly 
405 410 415 

Ala Cys Pro Thr Thr Ser Gly Val Pro Ser Asp Val Glu Ala Asn Ala 
420 425 430 

Pro Asn Ser Asn Val He Phe Ser Asn He Arg Phe Gly Pro He Gly 
435 440 445 

Ser Thr Val Gin Gly Leu Pro Ser Ser Gly Gly Thr Ser Ser Ser Ser 
450 455 460 

Ser Ala Ala Pro Gin Ser Thr Ser Thr Lys Ala Ser Thr Thr Thr Ser 
465 470 475 480 

Ala Val Arg Thr Thr Ser Thr Ala Thr Thr Lys Thr Thr Ser Ser Ala 
485 490 495 

Pro Ala Gin Gly Thr Asn Thr Ala Lys His Trp Gin Gin Cys Gly Gly 
500 505 510 

Asn Gly Trp Thr Gly Pro Thr Val Cys Glu Ser Pro Tyr Lys Cys Thr 
515 520 525 



Lys Gin Asn Asp Trp Tyr Ser Gin Cys Leu 
530 535 



<210> 11 

<211> 1248 

<212> DNA 

<213> Verticillium tenerum 
<220> 

<221> CDS 

<222> (D..U248) 

<223> 
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<400> 11 

atg aag aag get etc ate ace age etc tec ctg ctg gee acg gec atg 48 

Met Lys Lys Ala Leu He Thr Ser Leu Ser Leu Leu Ala Thr Ala Met 
1 5 10 15 

ggc cag cag gee ggt ace etc gag ace gag acg cat ccc aag ctg ace 96 
Gly Gin Gin Ala Gly Thr Leu Glu Thr Glu Thr His Pro Lys Leu Thr 

20 25 . 30 

tgg cag cgc tgc acc acc tec ggc tgt acc aac gtc aac ggc gag gtc 144 
Trp Gin Arg Cys Thr Thr Ser Gly Cys Thr Asn Val Asn Gly Glu Val 
35 40 45 

gtc ate gac gee aac tgg cgt tgg gec cac gac ate aac ggc tac gag 192 
Val He Asp Ala Asn Trp Arg Trp Ala His Asp He Asn Gly Tyr Glu 
50 55 60 

aac tgc ttc gag ggc aac acc tgg acc ggc acc tgc age ggc gec gac 240 
Asn Cys Phe Glu Gly Asn Thr Trp Thr Gly Thr Cys Ser Gly Ala Asp 
65 70 75 80 

ggc tgc gcg aag aac tgc gee gtc gag gga gee aac tac cag teg acc 288 
Gly Cys Ala Lys Asn Cys Ala val Glu Gly Ala Asn Tyr Gin Ser Thr 
85 90 95 

tac ggt gtc teg acc age ggc aac gee etc tec ctg cgc ttc gtc acc 336 
Tyr Gly Val Ser Thr Ser Gly Asn Ala Leu Ser Leu Arg Phe Val Thr 
100 105 HO 



gag cac gag cac ggc gtc aac acc ggt teg cgc acg tac etc atg gag 384 
Glu His Glu His Gly Val Asn Thr Gly Ser Arg Thr Tyr Leu Met Glu 
115 120 125 

age gec acc aag tac cag atg ttc acc ctg atg aac aac gag etc gee 432 
Ser Ala Thr Lys Tyr Gin Met Phe Thr Leu Met Asn Asn Glu Leu Ala 
130 135 140 

ttc gac gtc gac ctg tec aag gtc gec tgc ggc atg aac age gee etc 480 
Phe Asp Val Asp Leu Ser Lya Val Ala Cys Gly Met Asn Ser Ala Leu 
145 150 155 160 

tac etc gtc ccc atg aag gee gac ggc ggt etc teg tec gag acc aac 528 
Tyr Leu Val Pro Met Lys Ala Asp Gly Gly Leu Ser Ser Glu Thr Asn 
165 170 175 

aac aac gec ggc gee aag tac ggt acc ggt tac tgc gac gee cag tgc 576 
Asn Asn Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ala Gin Cys 
180 185 * 190 

get cgc gat etc aag ttc gtc aac ggc aag gec aac ate gag ggc tgg 624 
Ala Arg Asp Leu Lys Phe Val Asn Gly Lys Ala Asn He Glu Gly Trp 
195 200 " 205 



caa gec tec aag acc gac gag aac tct ggc gtc ggt aac atg ggc tec 672 

Gin Ala Ser Lys Thr Asp Glu Asn Ser Gly Val Gly Asn Met Gly Ser 
210 215 220 

tgc tgt get gag att gac gtt tgg gag tec aac cgc gag tct ttc gee 720 

Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Arg Glu Ser Phe Ala 
225 230 235 240 
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ttc acc cct cac get tgc teg cag aac gag tac cac gtc tgc ace ggc 768 
Phe Thr Pro His Ala Cys Ser Gin Asn Glu Tyr His Val Cys Thr Gly 
245 250 255 

gec aac tgc ggc ggt acc tac teg gac gac cgc ttc gee ggc aag tgc 816 
Ala Asn Cys Gly Gly Thr Tyr Ser Asp Asp Arg Phe Ala Gly Lys Cys 
260 265 270 

gat gec aac ggt tgc gac tac aac ccc ttc cgc gtg ggc aac cag aac 864 
Asp Ala Asn Gly Cys Asp Tyr Asn Pro Phe Arg Val Gly Asn Gin Asn 
275 280 285 

ttc tac ggc ccc ggc atg acc gtc aac acc aac tec aag ttc act gtc 912 
Phe Tyr Gly Pro Gly Met Thr Val Asn Thr Asn Ser Lys Phe Thr Val 
290 295 300 

ate tct cgc ttc egg gag aac gag gec tac cag gtc ttc ate cag aac 960 
lie Ser Arg Phe Arg Glu Asn Glu Ala Tyr Gin Val Phe lie Gin Asn 
305 310 315 320 

ggc cgc acc ate gag gtc ccc cgt ccc acc etc tec ggc ate acc cag 1008 
Gly Arg Thr lie Glu Val Pro Arg Pro Thr Leu Ser Gly lie Thr Gin 
325 330 335 

ttc gag gee aag ate acc ccc gag ttc tgc teg acc tac ccc acc gtc 1056 
Phe Glu Ala Lys He Thr Pro Glu Phe Cys Ser Thr Tyr Pro Thr Val 
340 345 350 

ttc ggc gac cgc gac cgc cac ggc gag ate ggc ggc cac acc gee etc 1104 
Phe Gly Asp Arg Asp Arg His Gly Glu He Gly Gly His Thr Ala Leu 
355 360 365 

aac gcg gec etc cgc atg ccc atg gtc etc gtc atg tec ate tgg gee 1152 
Asn Ala Ala Leu Arg Met Pro Met Val Leu Val Met Ser lie Trp Ala 
370 375 380 

gac cac tac gec aac atg etc tgg etc gac tec ate tac ccg cca gag 1200 
Asp His Tyr Ala Asn Met Leu Trp Leu Asp Ser He Tyr Pro Pro Glu 
385 390 ~ 395 * 400 

aag agg ggc cag ccc ggc gec cac cgc ggc cgc aga tct aga ggg tga 1248 
Lys Arg Gly Gin Pro Gly Ala His Arg Gly Arg Arg Ser Arg Gly 
405 410 415 



<210> 12 
<211> 415 
<212> PRT 

<213> Verticillium tenerum 
<400> 12 

Met Lys Lys Ala Leu He Thr Ser Leu Ser Leu Leu Ala Thr Ala Met 
15 10 15 

Gly Gin Gin Ala Gly Thr Leu Glu Thr Glu Thr His Pro Lys Leu Thr 
20 25 30 

Trp Gin Arg Cys Thr Thr Ser Gly Cys Thr Asn Val Asn Gly Glu Val 
35 40 45 
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Val lie Asp Ala Asn Trp Arg Trp Ala His Asp He Asn Gly Tyr Glu 
50 55 60 

Asn Cys Phe Glu Gly Asn Thr Trp Thr Gly Thr Cys Ser Gly Ala Asp 
65 70 75 80 

Gly Cys Ala Lys Asn Cys Ala Val Glu Gly Ala Asn Tyr Gin Ser Thr 
85 90 95 

Tyr Gly Val Ser Thr Ser Gly Asn Ala Leu Ser Leu Arg Phe Val Thr 
100 105 110 

Glu His Glu His Gly Val Asn Thr Gly Ser Arg Thr Tyr Leu Met Glu 
115 120 " 125 

Ser Ala Thr Lys Tyr Gin Met Phe Thr Leu Met Asn Asn Glu Leu Ala 
130 135 140 

Phe Asp Val Asp Leu Ser Lys Val Ala Cys Gly Met Asn Ser Ala Leu 
145 150 155 160 

Tyr Leu Val Pro Met Lys Ala Asp Gly Gly Leu Ser Ser Glu Thr Asn 
165 170 175 

Asn Asn Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ala Gin Cys 
180 185 190 

Ala Arg Asp Leu Lys Phe Val Asn Gly Lys Ala Asn He Glu Gly Trp 
195 200 205 

Gin Ala Ser Lys Thr Asp Glu Asn Ser Gly Val Gly Asn Met Gly Ser 
210 215 220 

Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Arg Glu Ser Phe Ala 
225 230 235 ' 240 

Phe Thr Pro His Ala Cys Ser Gin Asn Glu Tyr His Val Cys Thr Gly 
245 250 255 

Ala Asn Cys Gly Gly Thr Tyr Ser Asp Asp Arg Phe Ala Gly Lys Cys 
260 265 270 

Asp Ala Asn Gly Cys Asp Tyr Asn Pro Phe Arg Val Gly Asn Gin Asn 
275 280 285 

Phe Tyr Gly Pro Gly Met Thr Val Asn Thr Asn Ser Lys Phe Thr Val 
290 295 300 

He Ser Arg Phe Arg Glu Asn Glu Ala Tyr Gin Val Phe He Gin Asn 
305 310 315 320 

Gly Arg Thr lie Glu Val Pro Arg Pro Thr Leu Ser Gly He Thr Gin 
325 330 335 

Phe Glu Ala Lys lie Thr Pro Glu Phe Cys Ser Thr Tyr Pro Thr Val 
340 345 350 

Phe Gly Asp Arg Asp Arg His Gly Glu He Gly Gly His Thr Ala Leu 
355 360 365 

Asn Ala Ala Leu Arg Met Pro Met Val Leu Val Met Ser He Trp Ala 
370 375 380 
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Asp His Tyr Ala Asn Met Leu Trp Leu Asp Ser He Tyr Pro Pro Glu 
385 390 395 400 

Lys Arg Gly Gin Pro Gly Ala His Arg Gly Arg Arg Ser Arg Gly 
405 410 415 



<210> 13 

<211> 1341 

<212> DNA 

<213> Neotermes castaneus 
<220> 

<221> CDS 

<222> (1)..(1341) 

<223> 

<400> 13 

gca cga ggg etc get get gca ttg ttc acc ttt gca tgt age gtt ggt 48 

Ala Arg Gly Leu Ala Ala Ala Leu Phe Thr Phe Ala Cys Ser Val Gly 
1 5 10 15 

ate ggc acc aaa acg gcc gag aac cac ccg aag ctg aac tgg cag aac 96 
He Gly Thr Lys Thr Ala Glu Asn His Pro Lys Leu Asn Trp Gin Asn 
20 25 30 

tgc gcc tec aag ggc age tgc tea caa gtg tec ggc gaa gtg aca atg 144 
Cys Ala Ser Lys Gly Ser Cys Ser Gin Val Ser Gly Glu Val Thr Met 
35 40 45 

gac teg aac tgg egg tgg acc cac gat ggc aac ggc aag aac tgc tac 192 
Asp Ser Asn Trp Arg Trp Thr His Asp Gly Asn Gly Lys Asn Cys Tyr 
50 55 60 

gac ggc aac acc tgg ate tec age etc tgc cca gac ggc aag acc tgc 240 
Asp Gly Asn Thr Trp He Ser Ser Leu Cys Pro Asp Gly Lys Thr Cys 
65 70 75 80 

tct gac aag tgc gtc etc gat ggc gcc gaa tac caa gcg acc tac ggc 288 
Ser Asp Lys Cys Val Leu Asp Gly Ala Glu Tyr Gin Ala Thr Tyr Gly 
85 90 95 

ate acc teg aac ggg acc gcg gtc acc etc aag ttc gtc acc cac ggc 336 
He Thr Ser Asn Gly Thr Ala Val Thr Leu Lys Phe Val Thr His Gly 
100 105 110 

teg tac teg acg aac ate ggc tec cgc ctg tat etc etc aag gac gaa 384 
Ser Tyr Ser Thr Asn He Gly Ser Arg Leu Tyr Leu Leu Lys Asp Glu 
115 120 125 

aac act tac tac ate ttc aag gtg aac aac aag gaa ttc aca ttc age 432 
Asn Thr Tyr Tyr He Phe Lys Val Asn Asn Lys Glu Phe Thr Phe Ser 
130 135 140 

gtc gat gtg teg aag etc ccg tgc ggc ctg aac ggt gcc etc tac ttc 4B0 
Val Asp Val Ser Lys Leu Pro Cys Gly Leu Asn Gly Ala Leu Tyr Phe 
145 150 155 * 160 

gtc teg atg gac gcc gac ggt ggc gca gga aag tat tea ggt gcg aag 528 
Val Ser Met Asp Ala Asp Gly Gly Ala Gly Lys Tyr Ser Gly Ala Lys 
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cca ggc gcg aag tac ggc etc ggc tac tgc gat gcg caa tgc ccg age 576 
Pro Gly Ala Lys Tyr Gly Leu Gly Tyr Cys Asp Ala Gin Cys Pro Ser 
1B0 185 190 

gat ctg aag ttc ate aac ggc gaa gcg aac age gat ggc tgg aag ccc 624 
Asp Leu Lys Phe lie Asn Gly Glu Ala Asn Ser Asp Gly Trp Lys Pro 
195 200 205 

cag gcg aac gac aag aat gcg gga aac ggc aaa tac gga teg tgc tgc 672 
Gin Ala Asn Asp Lys Asn Ala Gly Asn Gly Lys Tyr Gly Ser Cys Cys 
210 215 220 

teg gaa atg gac gtt tgg gag gcg aac teg cag gca aca get tac act 720 
Ser Glu Met Asp Val Trp Glu Ala Asn Ser Gin Ala Thr Ala Tyr Thr 
225 230 235 240 

ccg cac gtc tgc aag ace acg ggc cag cag cgc tgc teg ggc aca teg 768 
Pro His Val Cys Lys Thr Thr Gly Gin Gin Arg Cys Ser Gly Thr Ser 
245 250 ~ 255 

gaa tgc ggc ggc cag gat ggc gca gcg cgt ttc cag gga ctg tgc gac 816 
Glu Cys Gly Gly Gin Asp Gly Ala Ala Arg Phe Gin Gly Leu Cys Asp 
260 265 270 

gag gac ggt tgc gac ttc aac age tgg cgc cag ggc gac aag acg ttc 864 
Glu Asp Gly Cys Asp Phe Asn Ser Trp Arg Gin Gly Asp Lys Thr Phe 
275 280 285 

tac ggc ccg gga ttg act gtt gac acg aag teg ccg ttc aca gtc gtc 912 
Tyr Gly Pro Gly Leu Thr Val Asp Thr Lys Ser Pro Phe Thr Val Val 
290 295 300 

aca caa ttc gtc gga agt ccg gtg aag gaa ate cgc agg aag tac gtc 960 
Thr Gin Phe Val Gly Ser Pro Val Lys Glu He Arg Arg Lys Tyr Val 
305 310 315 " " 320 

cag aac gga aag gtg att gag aac teg aag aac aag att teg gga att 1008 
Gin Asn Gly Lys Val He Glu Asn Ser Lys Asn Lys He Ser Gly lie 
325 330 335 

gac gag acg aac gca gtg agt gat act ttc tgc gat cag caa aag aag 1056 
Asp Glu Thr Asn Ala Val Ser Asp Thr Phe Cys Asp Gin Gin Lys Lys 
340 345 ' 350 

gee ttc ggt gat acg aac gat ttc aag aac aag ggc ggt ttc get aag 1104 
Ala Phe Gly Asp Thr Asn Asp Phe Lys Asn Lys Gly Gly Phe Ala Lys 
355 360 365 

ttg ggt cag gtg ttc gag act ggt cag gtt etc gtg ctg teg ctg tgg 1152 
Leu Gly Gin Val Phe Glu Thr Gly Gin Val Leu Val Leu Ser Leu Trp 
370 375 380 

gat gac cac teg gtt gca atg ctg tgg ttg gac teg gec tac cca acg 1200 
Asp Asp His Ser Val Ala Met Leu Trp Leu Asp Ser Ala Tyr Pro Thr 
385 390 395 400 

aac aag gat aag age age cca ggt gtt gac cgt ggg cct tgc ccg acg 1248 
Asn Lys Asp Lys Ser Ser Pro Gly Val Asp Arg Gly Pro Cys Pro Thr 
405 410 415 
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act tec ggg aag ccg gat gat gtt gaa age caa tct ccc gat gca acc 1296 
Thr Ser Gly Lys Pro Asp Asp Val Glu Ser Gin Ser Pro Asp Ala Thr 
420 425 430 

gtc att tat ggc aac ate aag ttc ggt gca ctg gac tec act tac 1341 
Val lie Tyr Gly Asn He Lys Phe Gly Ala Leu Asp Ser Thr Tyr 
435 440 445 



<210> 14 

<211> 447 

<212> PRT 

<213> Neotermes castaneus 

<400> 14 

Ala Arg Gly Leu Ala Ala Ala Leu Phe Thr Phe Ala Cys Ser Val Gly 
15 10 15 

He Gly Thr Lys Thr Ala Glu Asn His Pro Lys Leu Asn Trp Gin Asn 
20 25 30 

Cys Ala Ser Lys Gly Ser Cys Ser Gin Val Ser Gly Glu Val Thr Met 
35 40 45 

Asp Ser Asn Trp Arg Trp Thr His Asp Gly Asn Gly Lys Asn Cys Tyr 
50 55 60 

Asp Gly Asn Thr Trp He Ser Ser Leu Cys Pro Asp Gly Lys Thr Cys 
65 70 75 80 

Ser Asp Lys Cys Val Leu Asp Gly Ala Glu Tyr Gin Ala Thr Tyr Gly 
85 90 95 

He Thr Ser Asn Gly Thr Ala Val Thr Leu Lys Phe Val Thr His Gly 
100 105 110 

Ser Tyr Ser Thr Asn He Gly Ser Arg Leu Tyr Leu Leu Lys Asp Glu 
115 120 125 

Asn Thr Tyr Tyr He Phe Lys Val Asn Asn Lys Glu Phe Thr Phe Ser 
130 135 140 

Val Asp val Ser Lys Leu Pro Cys Gly Leu Asn Gly Ala Leu Tyr Phe 
145 150 155 160 

Val Ser Met Asp Ala Asp Gly Gly Ala Gly Lys Tyr Ser Gly Ala Lys 
165 170 175 

Pro Gly Ala Lys Tyr Gly Leu Gly Tyr Cys Asp Ala Gin Cys Pro Ser 
180 185 190 

Asp Leu Lys Phe He Asn Gly Glu Ala Asn Ser Asp Gly Trp Lys Pro 
195 200 205 

Gin Ala Asn Asp Lys Asn Ala Gly Asn Gly Lys Tyr Gly Ser Cys Cys 
210 215 220 

Ser Glu Met Asp Val Trp Glu Ala Asn Ser Gin Ala Thr Ala Tyr Thr 
225 230 235 240 

Pro His Val Cys Lys Thr Thr Gly Gin Gin Arg Cys Ser Gly Thr Ser 
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245 250 255 

Glu Cys Gly Gly Gin Asp Gly Ala Ala Arg Phe Gin Gly Leu Cys Asp 
260 265 ~ 270 

Glu Asp Gly Cys Asp Phe Asn Ser Trp Arg Gin Gly Asp Lys Thr Phe 
275 280 285 

Tyr Gly Pro Gly Leu Thr Val Asp Thr Lys Ser Pro Phe Thr Val Val 
290 295 300 

Thr Gin Phe Val Gly Ser Pro Val Lys Glu He Arg Arg Lys Tyr Val 
305 310 315 ~ 320 

Gin Asn Gly Lys Val He Glu Asn Ser Lys Asn Lys He Ser Gly He 
325 330 335 

Asp Glu Thr Asn Ala Val Ser Asp Thr Phe Cys Asp Gin Gin Lys Lys 
340 .345 350 

Ala Phe Gly Asp Thr Asn Asp Phe Lys Asn Lys Gly Gly Phe Ala Lys 
355 360 365 

Leu Gly Gin Val Phe Glu Thr Gly Gin Val Leu Val Leu Ser Leu Trp 
370 375 380 

Asp Asp His Ser Val Ala Met Leu Trp Leu Asp Ser Ala Tyr Pro Thr 
385 390 395 400 

Asn Lys Asp Lys Ser Ser Pro Gly Val Asp Arg Gly Pro Cys Pro Thr 
405 410 * 415 

Thr Ser Gly Lys Pro Asp Asp Val Glu Ser Gin Ser Pro Asp Ala Thr 
420 425 430 

Val He Tyr Gly Asn He Lys Phe Gly Ala Leu Asp Ser Thr Tyr 
435 440 445 



<210> 15 

<211> 1359 

<212> DNA 

<213> Melanocarpus albomyces 
<220> 

<221> CDS 

<222> (1)..(1359) 

<223> 

<400> 15 

atg atg atg aag cag tac etc cag tac etc gcg gee gcg ctg ccg etc 48 
Met Met Met Lys Gin Tyr Leu Gin Tyr Leu Ala Ala Ala Leu Pro Leu 
15 10 15 

gtc ggc etc gee gee ggc cag cgc get ggt aac gag acg ccc gag age 96 
Val Gly Leu Ala Ala Gly Gin Arg Ala Gly Asn Glu Thr Pro Glu Ser 
20 25 30 

cac ccc ccg etc ace tgg cag agg tgc acg gec ccg ggc aac tgc cag 144 
His Pro Pro Leu Thr Trp Gin Arg Cys Thr Ala Pro Gly Asn Cys Gin 
35 40 45 
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acc gtg aac gcc gag gtc gta att gac gcc aac tgg cgc tgg ctg cac 192 
Thr Val Asn Ala Glu Val Val He Asp Ala Asn Trp Arg Trp Leu His 
50 55 60 

gac gac aac atg cag aac tgc tac gac ggc aac cag tgg acc aac gcc 240 
Asp Asp Asn Met Gin Asn Cys Tyr Asp Gly Asn Gin Trp Thr Asn Ala 
65 70 75 * 80 

tgc age acc gcc acc gac tgc get gag aag tgc atg ate gag ggt gcc 288 
Cys Ser Thr Ala Thr Asp Cys Ala Glu Lys Cys Met lie Glu Gly Ala 
85 90 95 

ggc gac tac ctg ggc acc tac ggc gcc teg acc age ggc gac gcc ctg 336 
Gly Asp Tyr Leu Gly Thr Tyr Gly Ala Ser Thr Ser Gly Asp Ala Leu 
100 105 110 

acg etc aag ttc gtc acg aag cac gag tac ggc acc aac gtc ggc teg 384 
Thr Leu Lys Phe Val Thr Lys His Glu Tyr Gly Thr Asn Val Gly Ser 
115 120 125 

cgc ttc tac etc atg aac ggc ccg gac aag tac cag atg ttc gac etc 432 
Arg Phe Tyr Leu Met Asn Gly Pro Asp Lys Tyr Gin Met Phe Asp Leu 
130 135 140 

ctg ggc aac gag ctt gcc ttt gac gtc gac etc teg acc gtc gag tgc 480 
Leu Gly Asn Glu Leu Ala Phe Asp val Asp Leu Ser Thr Val Glu Cys 
145 150 155 160 

ggc ate aac age gcc ctg tac ttc gtc gcc atg gag gag gac ggc ggc 528 
Gly He Asn Ser Ala Leu Tyr Phe Val Ala Met Glu Glu Asp Gly Gly 
165 170 ~ 175 

atg gcc age tac ccg age aac cag gcc ggc gcc egg tac ggc act ggg 576 
Met Ala Ser Tyr Pro Ser Asn Gin Ala Gly Ala Arg Tyr Gly Thr Gly 
180 185 190 

tac tgc gat gcc caa tgc get cgt gac etc aag ttc gtt ggc ggc aag 624 
Tyr Cys Asp Ala Gin Cys Ala Arg Asp Leu Lys Phe Val Gly Gly Lys 
195 200 205 

gcc aac att gag ggc tgg aag ccg tec acc aac gac ccc aac get ggc 672 
Ala Asn He Glu Gly Trp Lys Pro Ser Thr Asn Asp Pro Asn Ala Gly 
210 215 220 

gtc ggc ccg tac ggc ggc tgc tgc get gag ate gac gtc tgg gag teg 720 
Val Gly Pro Tyr Gly Gly Cys Cys Ala Glu He Asp Val Trp Glu Ser 
225 230 235 240 

aac gcc tat gcc ttc get ttc acg ccg cac gcg tgc acg acc aac gag 768 
Asn Ala Tyr Ala Phe Ala Phe Thr Pro His Ala Cys Thr Thr Asn Glu 
245 250 * 255 

tac cac gtc tgc gag acc acc aac tgc ggt ggc acc tac teg gag gac 816 
Tyr His Val Cys Glu Thr Thr Asn Cys Gly Gly Thr Tyr Ser Glu Asp 
260 265 270 

cgc ttc gcc ggc aag tgc gac gcc aac ggc tgc gac tac aac ccc tac 864 
Arg Phe Ala Gly Lys Cys Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr 
275 280 285 



cgc atg ggc aac ccc gac ttc tac ggc aag ggc aag acg etc gac acc 912 
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Arg Met Gly Asn Pro Asp Phe Tyr Gly Lys Gly Lys Thr Leu Asp Thr 
290 ooc: 



295 



300 



age cgc aag ttc acc gtc gtc tec cgc ttc gag gag aac aag etc tec 
Ser Arg Lys Phe Thr Val Val Ser Arg Phe Glu Glu Asn Lys Leu Ser 
305 310 315 320 



960 



cag tac ttc ate cag gac ggc cgc aag ate gag ate ccg ccg ccg acg 
Gin Tyr Phe He Gin Asp Gly Arg Lys He Glu He Pro Pro Pro Thr 
325 330 335 



1008 



tgg gag ggc atg ccc aac age age gag ate acc ccc gag etc tgc tec 
Trp Glu Gly Met Pro Asn Ser Ser Glu lie Thr Pro Glu Leu Cys Ser 
340 345 350 



1056 



acc atg ttc gat gtg ttc aac gac cgc aac cgc ttc gag gag gtc ggc 
Thr Met Phe Asp Val Phe Asn Asp Arg Asn Arg Phe Glu Glu Val Gly 
355 360 365 



1104 



ggc ttc gag cag ctg aac aac gee etc egg gtt ccc atg gtc etc gtc 
Gly Phe Glu Gin Leu Asn Asn Ala Leu Arg Val Pro Met Val Leu Val 
370 375 " 380 



1152 



atg tec ate tgg gac gac cac tac gee aac atg etc tgg etc gac tec 
Met Ser He Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu Asp Ser 
385 390 395 400 



1200 



ate tac ccg ccc gag aag gag ggc cag ccc ggc gec gec cgt ggc gac 
He Tyr Pro Pro Glu Lys Glu Gly Gin Pro Gly Ala Ala Arg Gly Asp 
405 410 415 



1248 



tgc ccc acg gac teg ggt gtc ccc gec gag gtc gag get cag ttc ccc 
Cys Pro Thr Asp Ser Gly Val Pro Ala Glu Val Glu Ala Gin Phe Pro 
420 425 430 



1296 



gac gee cag gtc gtc tgg tec aac ate cgc ttc ggc ccc ate ggc teg 
Asp Ala Gin Val Val Trp Ser Asn He Arg Phe Gly Pro He Gly Ser 
435 440 445 



1344 



acc tac gac ttc taa 
Thr Tyr Asp Phe 
450 



1359 



<210> 16 
<211> 452 
<212> PRT 

<213> Melanocarpus albomyces 
<400> 16 

Met Met Met Lys Gin Tyr Leu Gin Tyr Leu Ala Ala Ala Leu Pro Leu 
15 10 15 

Val Gly Leu Ala Ala Gly Gin Arg Ala Gly Asn Glu Thr Pro Glu Ser 
20 25 30 

His Pro Pro Leu Thr Trp Gin Arg Cys Thr Ala Pro Gly Asn Cys Gin 
35 40 45 



Thr Val Asn Ala Glu Val Val He Asp Ala Asn Trp Arg Trp Leu His 
50 55 60 
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Asp Asp Asn Met Gin Asn Cys Tyr Asp Gly Asn Gin Trp Thr Asn Ala 
65 70 75 80 

Cys Ser Thr Ala Thr Asp Cys Ala Glu Lys Cys Met lie Glu Gly Ala 
85 90 95 

Gly Asp Tyr Leu Gly Thr Tyr Gly Ala Ser Thr Ser Gly Asp Ala Leu 
100 105 110 

Thr Leu Lys Phe Val Thr Lys His Glu Tyr Gly Thr Asn Val Gly Ser 
115 120 125 

Arg Phe Tyr Leu Met Asn Gly Pro Asp Lys Tyr Gin Met Phe Asp Leu 
130 135 140 

Leu Gly Asn Glu Leu Ala Phe Asp Val Asp Leu Ser Thr Val Glu Cys 
145 150 155 160 

Gly He Asn Ser Ala Leu Tyr Phe Val Ala Met Glu Glu Asp Gly Gly 
165 170 175 

Met Ala Ser Tyr Pro Ser Asn Gin Ala Gly Ala Arg Tyr Gly Thr Gly 
180 185 190 

Tyr Cys Asp Ala Gin Cys Ala Arg Asp Leu Lys Phe Val Gly Gly Lys 
195 200 205 

Ala Asn He Glu Gly Trp Lys Pro Ser Thr Asn Asp Pro Asn Ala Gly 
210 215 220 

Val Gly Pro Tyr Gly Gly Cys Cys Ala Glu He Asp Val Trp Glu Ser 
225 230 235 240 

Asn Ala Tyr Ala Phe Ala Phe Thr Pro His Ala Cys Thr Thr Asn Glu 
245 250 * 255 

Tyr His Val Cys Glu Thr Thr Asn Cys Gly Gly Thr Tyr Ser Glu Asp 
260 265 " 270 

Arg Phe Ala Gly Lys Cys Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr 
275 280 285 

Arg Met Gly Asn Pro Asp Phe Tyr Gly Lys Gly Lys Thr Leu Asp Thr 
290 295 300 

Ser Arg Lys Phe Thr Val Val Ser Arg Phe Glu Glu Asn Lys Leu Ser 
305 310 315 320 

Gin Tyr Phe He Gin Asp Gly Arg Lys He Glu He Pro Pro Pro Thr 
325 330 335 

Trp Glu Gly Met Pro Asn Ser Ser Glu He Thr Pro Glu Leu Cys Ser 
340 345 350 

Thr Met Phe Asp Val Phe Asn Asp Arg Asn Arg Phe Glu Glu Val Gly 
355 360 365 

Gly Phe Glu Gin Leu Asn Asn Ala Leu Arg val Pro Met Val Leu Val 
370 375 380 



Met Ser lie Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu Asp Ser 
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385 390 395 40 0 

He Tyr Pro Pro Glu Lys Glu Gly Gin Pro Gly Ala Ala Arg Gly Asp 
405 410 415 

Cys Pro Thr Asp Ser Gly Val Pro Ala Glu Val Glu Ala Gin Phe Pro 
420 425 430 

Asp Ala Gin Val Val Trp Ser Asn He Arg Phe Gly Pro He Gly Ser 
435 440 445 

Thr Tyr Asp Phe 
450 



<210> 17 
<211> 221 
<212> DNA 

<213> Trichothecium roseum 
<220> 

<221> misc_f eature 
<222> (1)..(221) 

<223> Partial CBH1 encoding sequence 
<400> 17 

tacgcccagt gcgcccgtga cctcaagttc ctcggcggca cttccaacta cgacggctgg 60 
aagccctcgg acactgacga cagcgccggt gtcggcaacc gcggatcctg ctgcgccgag 120 
attgacatct gggagtccaa ctcgcacgcc ttcgccttca ccccccacgc ctgcgagaac 180 
aacgagtacc acatctgcga gaccaccgac tgcggcggca c 221 

<210> 18 
<211> 239 
<212> DNA 

<213> Humicola nigrescens 
<220> 

<221> misc_f eature 
<222> (1)..(239) 

<223> Partial CBH1 encoding sequence 
<400> 18 

tacggcacgg ggtactgcga cgcccaatgc gcccgcgatc tcaagttcgt tggcggcaag 60 
gccaatgttg agggctggaa acagtccacc aacgatgcca atgccggcgt gggtccgatg 120 
ggcggttgct gcgccgaaat tgacgtctgg gaatcgaacg cccatgcctt cgccttcacg 180 
ccgcacgcgt gcgagaacaa caagtaccac atctgcgaga ctgacggatg cggcggcac 239 

<210> 19 

<211> 199 

<212> DNA 

<213> Cladorrhinum f oecundissimum 
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<220> 

<221> misc_feature 
<222> (1}..(199) 

<223> Partial CBH1 encoding sequence 
<400> 19 

tacataaacg gtatcggcaa cgttgagggt tggtcctcct ctaccaacga tcccaacgct 60 
ggtgtcggta accrcggtac ttgctgctcc gagaatggat atctgggagg ccaacaagat 120 
ctcgaccgcc tacactcccc acccctgcac caccatcgac cagcacatgt gcgagggcaa 180 
ctcgtgcggc ggaacctac X99 

<210> 20 

<211> 191 

<212> DNA 

<213> Diplodia gossypina 
<220> 

<221> misc_feature 

<:222> (1) .7(191) 

<223> Partial CBH1 encoding sequence 
<400> 20 

gttgatccga cggcaaggcc caacgtcgag ggctgggtcc cgtccgagaa cgactccaac 60 
gctggtgtcg gcaaccttgg ctcttgctgt gctgagatgg atatctggga ggccaactcc 120 
atctcgaccg cctacacccc ccacagctgc aagacggtcg cccagcactc ttgcactggc 180 
gacgactgcg g 191 

<210> 21 
<211> 232 
<212> DNA 

<213> Myceliophthora thermophila 
<220> 

<221> misc_feature 
<222> (1)..(232) 

<223> Partial CBH1 encoding sequence 
<400> 21 

gggtactgcg acgcccaatg cgcacgcgac ctcaagttcg tcggcggcaa gggcaacatc 60 
gagggctgga agccgtccac caacgatgcc aatgccggtg tcggtcctta tggcgggtgc 120 
tgcgctgaga tcgacgtctg ggagtcgaac aagtatgctt tcgctttcac cccgcacggt 180 
tgcgagaacc ctaaatacca cgtctgcgag accaccaact gcggcggcac ct 232 

<210> 22 
<211> 467 
<212> DNA 
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<213> Rhizomucor pusillus 

<220> 

<221> misc_feature 

<222> (1)..(467) 

<223> Partial CBH1 encoding sequence 

<400> 22 



tccttcgcct 


ttacccccca 


cgcttgctcg 


cagnaacgag taccacgtct gcaccaccaa 


60 


caactgcggc 


ggcacctact 


cggacgaccg 


cttcgccggc aagtgcgacg ccaacggttg 


120 


cgactacaac 


ccgttccgcc 


tgggcaacca 


ggacttctac ggcccgggca tgaccgtcga 


180 


caccaactcc 


aagttcaccg 


tcatctcccg 


cttcagggag aacgaggcct accaggtctt 


240 


catgcagggc 


ggccggacca 


tcgaggtccc 


ggccccgcag ctgtccgggc tcacccagtt 


300 


cgacgccaag 


atcacccccg 


agttctgcga 


cacctacccg accgtcttcg acgaccgcaa 


360 


ccgccacggc 


gagatcggcg 


gccacaccgc 


cctcaacgcc gccctgcgca tgcccatggt 


420 


cctcgtcatg 


tccatctggg 


ctgaccacta 


cgccagctgc tagtgtc 


467 



<210> 23 

<211> 534 

<212> DNA 

<213> Meripilus giganteus 
<220> 

<221> misc_feature 

<222> (1)..(534) 

<223> Partial CBH1 encoding sequence 

<400> 23 



gggagggctc 


cccgaacgac 


ccgaacgcgg 


gaagcggcca gtacggaacg tgctgcaacg 


60 


agatggacat 


ctgggaggcg 


aaccagaacg 


gcgcggcggt cacgccgcac gtctgctccg 


120 


tcgacggcca 


gacgcgctgc 


gagggcacgg 


actgcggcga cggcgacgag cggtacgacg 


180 


gcatctgcga 


caaggacggc 


tgcgacttca 


actcgtaccg catgggcgac cagtccttcc 


240 


tcggcctcgg 


caagaccgtc 


gacacctcga 


agaagttcac cgtcgtcacc cagttcctca 


300 


ccgcggacaa 


cacgacgtcc 


ggccagctca 


cggagatccg ccggctgtac gtgcaggacg 


360 


gcaaggtcat 


cgcgaactcg 


aagacgaaca 


tccccggcct cgactcgttc gactccatca 


420 


ccgacgactt 


ctgcaacgcg 


cagaaggagg 


tcttcggcga caccaactcg ttcgagaagc 


480 


tcggcggcct 


cgcggagatg 


ggcaaggcct 


tccagaaggg catggtcctc gtca 


534 



<210> 24 

<211> 563 

<212> DNA 

<213> Exidia glandulosa 
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<220> 

<221> misc_feature 
<222> (1)..(563) 

<223> Partial CBH1 encoding sequence 
<400> 24 

gccacgtcga gggctggact ccttcmccaa cgatgccaac gccggcattg gcacccacgg 60 

ctcctgctgt tcggagatgg acatctggga ggctaacaat gttgccgctg cgtacacccc 120 

ccatccttgc acaactatcg gccagtcgat ctgctcgggc gattcttgcg gaggaaccta 180 

cagctctgac cgttacgccg gtgtctgcga tccagacggt tgcgatttca acagctaccg 240 

catgggcgac acgggcttct acggcaaggg cctgacagtc gacacgagct ccaagttcac 300 

cgtcgtcacc cagttcctca ccggctccga cggcaacctt tccgagatca agcgcttcta 360 

cgtccagaac ggcaaggtca ttcccaactc gcagtccaag attgccggcg tcagcggcaa 420 

ctccatcacc accgacttct gctccgccca gaagaccgcc ttcggcgaca ccaacgtctt 480 

cgcgcaaaag ggaggtactc gccgggatgg gcgccgccct caaggccggc atggtcctcg 54 0 

tcatgtccat ctgggacgac cac 563 



<210> 25 
<211> 218 
<212> DNA 

<213> Xylaria hypoxylon 
<220> 

<22l> miGc_feature 
<222> (1)..(218) 

<223> Partial CBH1 encoding sequence 
<400> 25 

gacgctcagt gtgcccgtga cttgaagttc gtcggtggca agggcaacgt tgagggatgg 60 
gagccatcca ccaacgacga caacgccggt gttggccctt acggwgcctg ctgtgccgaa 120 
atsgatgtst gggagtccaa ctstcactct ttcgctttca cccctcaccc wtgcaccacc 180 
aacgaatacc acgtctgtga gcaggacgag tgtggcgg 218 



<210> 26 

<2U> 492 

<212> DNA 

<213> Acremonium sp. 
<220> 

<221> misc_feature 

<222> (1)..(492) 

<223> Partial CBH1 encoding sequence 

<400> 26 
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gggacggggt actgcgacgc ccaatgcgcc cgtgatctca agttcgtcgg cggcaaggcc 60 

aacattgagg gctggaggcc gtccaccaac gacgcgaacg ccggcgtcgg cccgatgggc 120 

ggctgctgcg cggaaatcga tgtctgggag tccaacgccc acgcttttgc cttcacgccg 180 

cacgcgtgcg agaacaacaa ctaccacatc tgcgagacct ccaactgcgg cggtacctac 240 

tccgacgacc gcttcgccgg cctctgcgac gccaacggct gcgactacaa cccgtaccgc 300 

atgggcaacc ccgacttcta cggcaagggc aagactcttg acacctcgcg gaagttcacc 360 

gtcgtcaccc gctttcagga gaacgacctc tcgcagtact tcgtccagga cggcccgaag 420 

atcgagatcc cgcccccgac ctgggacggc ctcccgaaga gcagcacata cgccgagctg 480 

tgcgcgaccc ag 492 

<210> 27 
<211> 481 
c212> DNA 

<213> Acremonium sp. 
c220> 

<221> misc_feature 
<222> (1)..(481) 

<223> Partial CBH1 encoding sequence 
<400> 27 

ggctccgttt actcctaccc ttgcacggaa atcggccaga gccgctgcga gggcgacagc 60 

tgcggcggta cctacagcac cgaccgctac gctggcgtct gcgaccccga tggatgcgac 120 

ttcaactcgt accgccaggg caacaagacc ttctatggca agggcatgac cgtcgacacc 180 

accaagaaga ttaccgtcgt cacccagttc ctcaccgact cgtccggcaa cctgtccgag 240 

atcaagcgct tctacgccca gaacggcgtc gtcatcccca actccgagtc caccattgct 300 

ggcgtccctg gcaactcgat cacccaggac tactgcgaca agcagaagac cgcctttggt 360 

gacaacaacg acttcgacaa gaagggtggt ctcgcccaga tgggtaaggc cctggcccaa 420 

cccatggtcc tcgtcatgtc cgtctgggat gaccatgccg tcaacatgct ctgcttcgaa 480 

a 481 



<210> 28 

<211> 463 

<212> DNA 

<213> Chaetomium sp. 
<220> 

<221> misc_feature 

<222> (1)..{463) 

<223> Partial CBH1 encoding sequence 

<400> 28 
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ctccccgtct 


tcacgccgca 


cgcgtgcaag 


aacatcaagt accacgtctg cgagacgtcg 


60 


ggatgcggcg 


gcacctactc 


ggaggaccgc 


ttcgcgggcg actgcgacgc caacggttgc 


120 


gactacaacc 


cctaccgcat 


gggcaacacc 


gacttctacg gcaagggcat gacggtcgac 


180 


accagcaaga 


agttcaccgt 


cgtgacccaa 


ttccaggaga acaagctcac ccagttcttc 


240 


gtccagaacg 


gcaagaagat 


cgagatccct 


ggccccaagt gggacggcat tgagggcgac 


300 


agcgccgcca 


tcacgcccca 


gctgtgcact 


tccatgttca aggccttcga cgaccgcgat 


360 


cgcttetcgg 


aggtcggcgg 


cttcacccag 


atcaaccagg ccctctcggt gcccatggtg 


420 


ctcgtcatgt 


ccatctggga 


cgaccactac 


gccaacatgc ttg 


463 



<210> 29 

<211> 513 

<212> DNA 

<213> Chaetomidium pingtungium 
<220> 

< 22 1 > mi sc_f eature 

<222> (1)..(513) 

<223> Partial CBH1 encoding sequence 

<400> 29 



gaagggtggc 


agccctcctc caacgatgcc 


aatgcgggta ccggcaacca cgggtcctgc 


60 


tgcgcggaga 


tggatatctg ggaggccaac 


agcatctcca cggccttcac cccccatccg 


120 


tgcgacacgc 


ccggccaggt gatgtgcacc 


ggtgatgcct gcggtggcac ctacagctcc 


180 


gaccgctacg 


gcggcacctg cgaccccgac 


ggatgtgatt tcaactcctt ccgccagggc 


240 


aacaagacct 


tctacggccc tggcatgacc 


gtcgacacca agagcaagtt taccgtcgtc 


300 


acccagttca 


tcaccgacga cggcacctcc 


agcggcaccc tcaaggagat caagcgcttc 


360 


tacgtgcaga 


acggcaaggt gatccccaac 


tcggagtcga cctggaccgg cgtcagcggc 


420 


aactccatca 


ccaccgagta ctgcaccgcc 


cagaagagcc tgttccagga ccagaacgtc 


480 


ttcgaaaagc 


acggtggcct cgagggcatg 


ggt 


513 



<210> 


30 


<211> 


579 


<212> 


DNA 


<213> 


Myceliophthora thermophila 


<220> 




<221> 


misc feature 


<222> 


(1) • . (579) 


<223> 


Partial CBHl encoding sequence 


<400> 


30 



gagatggata tttgggaggc caacaacatg gccgccgcct tcactcccca cccttgcacc 60 
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gtgatcggcc agtcgcgctg cgagggcgac tcgtgcggcg gtacctacag caccgaccgc 
tatgccggca tctgcgaccc cgacggatgc gacttcaact cgtaccgcca gggcaacaag 
accttctacg gcaagggcat gacggtcgac acgaccaaga agatcacggt cgtcacccag 
ttcctcaaga actcggccgg cgagctctcc gagatcaagc ggttctacgt ccagaacggc 
aaggtcatcc ccaactccga gtccaccatc ccgggcgtcg agggcaactc cattacccag 
gactggtgcg accgccagaa ggccgctttc ggcgacgtga ccgactttca ggacaagggc 
ggcatggtcc agatgggcaa ggccctcgcg ggcccaatgg tcctcgtcat gtccatctgg 
gacgaccacg ccgtcaacat gctctggctc gaaatcacta gtgcggccgc tgcaggtcga 
ccatatggga gagctccacg cgttggatgc atagcttga 

<210> 31 

<2ll> 514 

<212> DNA 

<213> Myceliophthora hinnulea 
<220> 

<221> misc_feature 

<222> (1)..{514) 

<223> Partial CBH1 encoding sequence 

<400> 31 



cgtgagggct 


gggagagctc 


gaccaacgat 


gccaacgccg gcacgggcag gtacggcagc 


60 


tgctgctccg 


agatggacgt 


ctgggaggcc 


aacaacatgg ccaccgcctt caccccccat 


120 


ccttgcacca 


tcatcggcca 


gtcgcgctgc 


gagggcgaga cgtgcggcgg cacctacagc 


180 


tcggaccgct 


acgccggcgt 


ctgcgacccc 


gacggctgcg acttcaactc gtaccgccag 


240 


ggcaacaaga 


ccttctacgg 


caagggcatg 


acggtcgaca cgaccaagaa gctcacggtc 


300 


gtcacgcagt 


tcctcaagaa 


ctcggccggc 


gagctgtccg agatcaagcg gttctacgtc 


360 


caggacggca 


aggtgatccc 


caactccgag 


tccaccatcc ccggcgtcga gggcaactcg 


420 


atcacgcagg 


actggtgcga 


ccgccagaag 


gccgccttcg gcgacgtcac cgacttccag 


480 


gacaagggcg 


gcatggtcca 


gatggcaagg 


cgct 


514 



<210> 


32 


<211> 


477 


<212> 


DNA 


<213> 


Sporotrichum pruinosum 


<220> 




<221> 


misc feature 


<222> 


(1) - . (477) 


<223> 


Partial CBH1 encoding sequence 


<400> 


32 



120 
180 
240 
300 
360 
420 
480 
540 
579 
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cacccttgcc 


gcaccacgaa 


cgacggtggc 


taccaacgct gccaaggacg 


tgactgcaac 


60 


cagcctcgtt 


atgagggtct 


ttgcgatcct 


gacggttgcg actacaaccc 


tttccgtatg 


120 


ggtaaccgcg 


aattctacgg 


ccctggaaag 


accgtcgaca ccaacaggaa 


gttcactgtt 


180 


gtgacccaat 


tcattaccga 


caacaactct 


gacactggta ccctcgtcga 


CditCCClCCGC 


240 


ctctacgtcc 


aagacggccg 


tgtcattgcc 


aaccctccca ccaacttccc 


cggtctcatg 


300 


cccgcccacg 


actccatcac 


ttagcaattc 


tgtgacgacg ccaagcgagc 


attcgaggac 


360 


aacgacagct 


ttggcaggaa 


cggtggtctt 


gctcacatgg gtcgctccct 


tgccaagggc 


420 


catgtcctcg 


ccctttccat 


ttggaatgat 


cacactgcca acatgctctg 


gctcgaa 


477 



<210> 33 

<211> 500 

<212> DNA 

<213> Thielavia cf. microspora 
<220> 

<221> misc_f eature 
<222> (500) 

<223> Partial CBH1 encoding sequence 

<400> 33 



gagatagatg 


tctgggagtc 


caactcgcac 


tcgtttgcct tcacgccgca cgcgtgcaag 


60 


aacaacaagt 


accacgtctg 


ccagacgacc 


gggtgcggcg gcacctactc ggaggaccgc 


120 


ttcgccggcg 


actgcgacgc 


caacggctgc 


gactacaacc cctaccgcat gggcaacacc 


180 


gacttttacg 


gcaagggcaa 


gacggtcgac 


acgagcaaga agtttaccat ggtgacccag 


240 


ttccaaaaga 


acaagctcgt 


ccagttcttt 


gtccaggacg gcaagaagat cgacatcccc 


300 


ggccccaagt 


gggacggcct 


gccgcagggc 


agcgccgcca tcaccccgga gctgtgcacc 


360 


ttcatgttca 


aggccttcaa 


cgaccgcgac 


cgcttctcag aggttggcgg cttcgaccag 


420 


atcaacacgg 


ccctctcggt 


gccaatggtg 


ctcgtcatgt ccatctggga tgatcactac 


480 


gccaacatgc 


tctggcttga 






500 



<210> 34 

<211> 470 

<212> DNA 

<213> Scytalidium sp. 
<220> 

<221> misc_f eature 

<222> (1) . . (470) 

<223> Partial CBH1 encoding sequence 

<400> 34 

cgttnggccc gcgtcgcatg ctcccgcccg catggcccgc gggatttcca gccagagcat 60 
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gttggagtgg 


tggtcatccc 


agatggacat gacaaggacc atgggaatgg tgagggcctc 


120 


gttcagagca 


tcgaagccac 


cggtctcggc gaagcggttg cggtcatcga agacgcggaa 


180 


ctgagcatcg 


cagagctcag 


gggtgatgtc ggcgctgttc gggaggccgg gccaggtcqq 


240 


agggggcacc 


tcgatcttgc 


ggccgtcctg gacgaagaac tgagagagcc tgttacgctc 


300 


gaagcgggag 


acaacggtga 


acttgcggtt ggtgtcgacg gtcttgccct tgccatagaa 


360 


gtccttgttg 


cccatgcggt 


aggggttgta gtcgcagccg ttggcatcgc agtagccggc 


420 


gaagcggtca 


tccgagtagg 


taccaccgca gttgttggtc tccagatgtg 


470 



<210> 35 

<211> 491 

<212> DNA 

<213> Scytalidium sp. 
<220> 

<221> misc_feature 

c222> (1)..(491) 

<223> Partial CBH1 encoding sequence 

<400> 35 



gaaatcgacg 


tctgggagtc 


gaacgcctat gcctatgcct taccccgcac gcttgcggca 


60 


gccagaaccg 


ctaccacgtc 


tgcgagacca acaactgcgg tggtacctac tcggatgacc 


120 


gcttcgccgg 


ttactgcgat 


gccaacggct gcgactacaa cccgtaccgc atgggcaaca 


180 


gggacttcta 


cggcaagggc 


ctgcaggtcg acaccagccg gaagttcacc gtcgtgagcc 


240 


gcttcgagcg 


caacaagctc 


acccagttct tcgttcagga cggccgcaag atcgagcccc 


300 


ctgcgccgac 


ctgggacggc 


atcccgaaga gcgccgacat cacccccgag ttctgcagcg 


360 


cccagttcaa 


ggtcttcgac 


gaccgtgacc gcttcgcgga gactggcggc ttcgatgccc 


420 


tgaacgatgc 


tctcagcatt 


cccatggtcc ttgtcatgtc catctgggat taccactact 


480 


ccaacataat 


c 




491 



<210> 36 
<211> 221 
<212> DNA 

<213> Trichophaea saccata 
<220> 

<221> misc_feature 
<222> (1)..(221) 

<223> Partial CBH1 encoding sequence 
<400> 36 

tgcgactccc agtgtccccg cgatctcaag ttcatcaatg gacagggcaa cgttgaaggc 60 
tggaagccat cctcaaatga tgccaacgca ggcgtcgggg gacacggttc ctgctgcgca 120 
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gagatggatg tttgggaggc caattccatc tccgcggccg taacaccgca ctcgtgctcc 180 
acaaccagcc agacgatgtg caacggcgac tcctgcggcg g 221 

<210> 37 

<211> 1365 

<212> DNA 

<213> Diplodia gossypina 
<220> 

<221> CDS 

<222> (1)..(1365) 

<223> 

<400> 37 

atg ctt acc cag gca gtt etc get act etc gec ace ctg gec gee age 48 
Met Leu Thr Gin Ala Val Leu Ala Thr Leu Ala Thr Leu Ala Ala Ser 
1 5 10 15 

cag cag gtc ggc acc cag aag gag gag gtc cac ccc tec atg acc tgg 96 
Gin Gin Val Gly Thr Gin Lys Glu Glu Val His Pro Ser Met Thr Trp 
20 25 30 

cag act tgc acc age age ggc tgc acc acc aac cag ggc tec ate gtc 144 
Gin Thr Cys Thr Ser Ser Gly Cys Thr Thr Asn Gin Gly Ser He Val 
35 40 45 

gtt gac gec aac tgg cgc tgg gtc cac aac acc gag ggc tac acc aac 192 
Val Asp Ala Asn Trp Arg Trp Val His Asn Thr Glu Gly Tyr Thr Asn 
50 55 60 

tgc tac acg ggc aac acc tgg aac gee gac tac tgc acc gac aac acc 240 
Cys Tyr Thr Gly Asn Thr Trp Asn Ala Asp Tyr Cys Thr Asp Asn Thr 
65 70 75 80 

gag tgc gee tec aac tgc gec etc gac ggc gee gac tac tct ggc acc 288 
Glu Cys Ala Ser Asn Cys Ala Leu Asp Gly Ala Asp Tyr Ser Gly Thr 
85 90 95 

tac ggc get acc acc tec ggc gac teg ctg cgc ctg aac ttc ate acc 336 
Tyr Gly Ala Thr Thr Ser Gly Asp Ser Leu Arg Leu Asn Phe He Thr 
100 105 no 

aac ggc cag cag aag aac att ggc tec cgc atg tac etc atg cag gat 384 
Asn Gly Gin Gin Lys Asn He Gly Ser Arg Met Tyr Leu Met Gin Asp 
115 120 125 

gac gag acc tac gee gtc cac aag etc etc aac aag gag ttc acc ttc 432 
Asp Glu Thr Tyr Ala Val His Lys Leu Leu Asn Lys Glu Phe Thr Phe 
130 135 140 

gac gtc gac acc tec aag ctg cct tgc ggc etc aac ggt gee gtc tac 480 
Asp Val Asp Thr Ser Lys Leu Pro Cys Gly Leu Asn Gly Ala Val Tyr 
145 150 155 160 

ttc gtc tec atg gac get gac ggt ggc atg gee aag ttc ccc gac aac 528 
Phe Val Ser Met Asp Ala Asp Gly Gly Met Ala Lys Phe Pro Asp Asn 
165 170 175 



40 



WO 03/000941 PCT/DK02/00429 

aag gcc ggc gcc aag tac ggt acc ggt tac tgc gac teg cag tgc ccc 576 
Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser Gin Cys Pro 
180 185 190 

cgt gac etc aag ttc ate gac ggc aag gcc aac gtc gag ggc tgg gtc 624 
Arg Asp Leu Lys Phe He Asp Gly Lys Ala Asn Val Glu Gly Trp Val 
195 200 205. 

ccg tec gag aac gac tec aac get ggt gtc ggc aac ctt ggc tct tgc 672 
Pro Ser Glu Asn Asp Ser Asn Ala Gly Val Gly Asn Leu Gly Ser Cys 
210 215 220 

tgt get gag atg gat ate tgg gag gcc aac tec ate teg acc gcc tac 720 
Cys Ala Glu Met Asp He Trp Glu Ala Asn Ser He Ser Thr Ala Tyr 
225 230 235 240 

acc ccc cac age tgc aag acg gtc gcc cag cac tct tgc act ggc gac 768 
Thr Pro His Ser Cys Lys Thr Val Ala Gin His Ser Cys Thr Gly Asp 
245 250 255 

gac tgc ggt ggc acc tac tec gcg acc cgc tac gcc ggc gac tgc gac 816 
Asp Cys Gly Gly Thr Tyr Ser Ala Thr Arg Tyr Ala Gly Asp Cys Asp 
260 265 270 

ccc gac gga tgc gac ttc aac teg tac cgc cag ggc gtc aag gac ttc 864 
Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Gin Gly Val Lys Asp Phe 
275 280 285 

tac ggg ccc ggc atg acc gtc gac age aac teg gtc gtc acc gtc gtc 912 
Tyr Gly Pro Gly Met Thr Val Asp Ser Asn Ser Val Val Thr Val Val 
290 295 300 

acg cag ttc ate acc aac gac ggc acc gcg tec ggc acc etc tec gag 960 
Thr Gin Phe He Thr Asn Asp Gly Thr Ala Ser Gly Thr Leu Ser Glu 
305 310 315 320 

ate aag cgc ttc tac gtc cag aac ggc aag gtt ate ccc aac tec gag 1008 
He Lys Arg Phe Tyr Val Gin Asn Gly Lys Val He Pro Asn Ser Glu 
325 330 335 

tec acc ate gcc ggc gtc age ggc aac age ate acc tec gcg tac tgc 1056 
Ser Thr He Ala Gly Val Ser Gly Asn Ser He Thr Ser Ala Tyr Cys 
340 345 350 

gac gcg cag aag gag gtc ttc ggc gac aac acg teg ttc cag gac cag 1104 
Asp Ala Gin Lys Glu Val Phe Gly Asp Asn Thr Ser Phe Gin Asp Gin 
355 360 365 

ggc ggc ttg gcc age atg age cag gcc etc aac gcc ggc atg gtc etc 1152 
Gly Gly Leu Ala Ser Met Ser Gin Ala Leu Asn Ala Gly Met Val Leu 
370 375 380 

gtc atg tec ate tgg gac gac cac cac age aac atg etc tgg etc gac 1200 
val Met Ser He Trp Asp Asp His His Ser Asn Met Leu Trp Leu Asp 
385 390 395 400 

tec gac tac ccc gtc gac gcc gac ccg age cag ccc ggc ate tec cgc 1248 
Ser Asp Tyr Pro Val Asp Ala Asp Pro Ser Gin Pro Gly He Ser Arg 
405 410 415 

ggt act tgc ccc acc acc tct ggt gtc ccc age gag gtt gag gag age 1296 
Gly Thr Cys Pro Thr Thr Ser Gly Val Pro Ser Glu Val Glu Glu Ser 
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420 425 430 

gcc get age gec tac gtc gtc tac teg aac att aag gtt ggt gac ctt 1344 
Ala Ala Ser Ala Tyr Val Val Tyr Ser Asn lie Lys Val Gly Asp Leu 
435 440 445 

aac age act ttc tct get tag 1365 
Asn Ser Thr Phe Ser Ala 
450 



<210> 38 

<211> 454 

<212> PRT 

<213> Diplodia gossypina 

<400> 38 

Met Leu Thr Gin Ala Val Leu Ala Thr Leu Ala Thr Leu Ala Ala Ser 
1 5 10 15 

Gin Gin Val Gly Thr Gin Lys Glu Glu Val His Pro Ser Met Thr Trp 
20 25 30 

Gin Thr Cys Thr Ser Ser Gly Cys Thr Thr Asn Gin Gly Ser He Val 
35 40 45 

Val Asp Ala Asn Trp Arg Trp Val His Asn Thr Glu Gly Tyr Thr Asn 
50 55 60 

Cys Tyr Thr Gly Asn Thr Trp Asn Ala Asp Tyr Cys Thr Asp Asn Thr 
65 70 75 80 

Glu Cys Ala Ser Asn Cys Ala Leu Asp Gly Ala Asp Tyr Ser Gly Thr 
85 90 ~ 95 

Tyr Gly Ala Thr Thr Ser Gly Asp Ser Leu Arg Leu Asn Phe He Thr 
100 105 HO 

Asn Gly Gin Gin Lys Asn He Gly Ser Arg Met Tyr Leu Met Gin Asp 
115 120 " 125 

Asp Glu Thr Tyr Ala Val His Lys Leu Leu Asn Lys Glu Phe Thr Phe 
130 135 140 

Asp Val Asp Thr Ser Lys Leu Pro Cys Gly Leu Asn Gly Ala Val Tyr 
145 150 155 160 

Phe Val Ser Met Asp Ala Asp Gly Gly Met Ala Lys Phe Pro Asp Asn 
165 170 175 

Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser Gin Cys Pro 
180 185 190 

Arg Asp Leu Lys Phe He Asp Gly Lys Ala Asn Val Glu Gly Trp Val 
195 200 205 

Pro Ser Glu Asn Asp Ser Asn Ala Gly Val Gly Asn Leu Gly Ser Cys 
210 215 220 

Cys Ala Glu Met Asp He Trp Glu Ala Asn Ser He Ser Thr Ala Tyr 
225 230 235 240 
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Thr Pro His Ser Cys Lys Thr Val Ala Gin His Ser Cys Thr Gly Asp 
245 250 255 

Asp Cys Gly Gly Thr Tyr Ser Ala Thr Arg Tyr Ala Gly Asp Cys Asp 
260 265 270 

Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Gin Gly Val Lys Asp Phe 
275 280 285 

Tyr Gly Pro Gly Met Thr Val Asp Ser Asn Ser Val Val Thr Val Val 
290 295 300 

Thr Gin Phe He Thr Asn Asp Gly Thr Ala Ser Gly Thr Leu Ser Glu 
305 310 315 320 

He Lys Arg Phe Tyr Val Gin Asn Gly Lys Val He Pro Asn Ser Glu 
325 330 335 

Ser Thr He Ala Gly Val Ser Gly Asn Ser He Thr Ser Ala Tyr Cys 
340 345 350 

Asp Ala Gin Lys Glu Val Phe Gly Asp Asn Thr Ser Phe Gin Asp Gin 
355 360 365 

Gly Gly Leu Ala Ser Met Ser Gin Ala Leu Asn Ala Gly Met Val Leu 
370 375 380 

Val Met Ser He Trp Asp Asp His His Ser Asn Met Leu Trp Leu Asp 
385 390 395 ~ 400 

Ser Asp Tyr Pro Val Asp Ala Asp Pro Ser Gin Pro Gly He Ser Arg 
405 410 415 

Gly Thr Cys Pro Thr Thr Ser Gly Val Pro Ser Glu Val Glu Glu Ser 
420 425 430 

Ala Ala Ser Ala Tyr Val Val Tyr Ser Asn lie Lys Val Gly Asp Leu 
435 440 445 



Asn Ser Thr Phe Ser Ala 
450 



<210> 39 

<211> 1377 

<212> DNA 

<213> Trichophaea saccata 
<220> 

<221> CDS 

<222> (1)..(1377) 

<223> 

<400> 39 

atg caa cgc ctt etc gtt ctt etc acc tec ctt etc get ttc acc tat 48 
Met Gin Arg Leu Leu Val Leu Leu Thr Ser Leu Leu Ala Phe Thr Tyr 
15 10 15 

ggc caa caa gtt ggc act caa cag gec gaa gtc cac ccc teg atg acc 96 
Gly Gin Gin Val Gly Thr Gin Gin Ala Glu Val His Pro Ser Met Thr 



43 



WO 03/000941 PCT/DK02/00429 
20 25 30 

tgg cag cag tgt aca aag tec ggc ggc tgc acc acg aag aac ggc aaa 144 
Trp Gin Gin Cys Thr Lys Ser Gly Gly Cys Thr Thr Lys Asn Gly Lys 
35 40 45 

gtc gtg ate gat gec aac tgg cgt tgg gta cac aat gtc ggc ggc tac 192 
Val Val He Asp Ala Asn Trp Arg Trp Val His Asn Val Gly Gly Tyr 
50 55 60 

acc aat tgc tac act ggc aac acc tgg gac agt teg ctt tgt ccc gac 240 
Thr Asn Cys Tyr Thr Gly Asn Thr Trp Asp Ser Ser Leu Cys Pro Asp 
65 70 75 " BO 

gat gtc acc tgc gcg aag aat tgc get ctt gat ggc gcg gac tac tct 288 
Asp Val Thr Cys Ala Lys Asn Cys Ala Leu Asp Gly Ala Asp Tyr Ser 
85 90 95 

ggc act tat gga gtt act gcg ggc ggg aat teg ttg aag etc acc ttc 336 
Gly Thr Tyr Gly Val Thr Ala Gly Gly Asn Ser Leu Lys Leu Thr Phe 
100 105 no 

gtc act aag ggt caa tac tct act aat gtg ggc teg cga ttg tat atg 384 
Val Thr Lys Gly Gin Tyr Ser Thr Asn Val Gly Ser Arg Leu Tyr Met 
115 120 125 

etc gee gac gac age aca tac cag atg tat aat ctg ctg aac cag gag 432 
Leu Ala Asp Asp Ser Thr Tyr Gin Met Tyr Asn Leu Leu Asn Gin Glu 
130 135 140 

ttt acg ttc gac gtt gat gtt tct aat ctt cct tgt ggg ctt aac ggg 480 
Phe Thr Phe Asp Val Asp Val Ser Asn Leu Pro Cys Gly Leu Asn Gly 
145 150 155 160 

get ctg tat ttc gtc teg atg gat aag gat ggt ggg atg teg aag tac 528 
Ala Leu Tyr Phe Val Ser Met Asp Lys Asp Gly Gly Met Ser Lys Tyr 
165 170 175 

tct ggg aac aag get ggt gee aag tat gga act ggg tac tgc gac tec 576 
Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 
180 185 * 190 

cag tgt ccc cgc gat etc aag ttc ate aat gga cag ggc aac gtt gaa 624 
Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Gin Gly Asn Val Glu 
195 200 205 

ggc tgg aag cca tec tea aat gat gee aac gca ggc gtc ggg gga cac 672 
Gly Trp Lys Pro Ser Ser Asn Asp Ala Asn Ala Gly Val Gly Gly His 
210 215 220 

ggt tec tgc tgc gca gag atg gat gtt tgg gag gee aat tec ate tec 720 
Gly Ser Cys Cys Ala Glu Met Asp Val Trp Glu Ala Asn Ser He Ser 
225 230 235 240 

gcg gec gta aca ccg cac teg tgc tec aca acc age cag acg atg tgc 768 
Ala Ala Val Thr Pro His Ser Cys Ser Thr Thr Ser Gin Thr Met Cys 
245 250 255 

aac ggc gac tec tgc ggc ggt acc tac tea gec aca cga tac get ggt 816 
Asn Gly Asp Ser Cys Gly Gly Thr Tyr Ser Ala Thr Arg Tyr Ala Gly 
260 265 270 



44 



WO 03/000941 PCT/DK02/00429 

gtc tgc gat ccc gat ggc tgc gac ttc aac tec tac cgt atg ggc gac 864 
Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asp 
275 280 " 285 

acg acc ttc tac ggc aag gga aag acg gtc gat acc age tec aag ttc 912 
Thr Thr Phe Tyr Gly Lys Gly Lys Thr Val Asp Thr Ser Ser Lys Phe 
290 295 300 

acg gtc gtg acc cag ttc ate acc gac act gga acc gee tec ggc teg 960 
Thr Val Val Thr Gin Phe lie Thr Asp Thr Gly Thr Ala Ser Gly Ser 
305 310 315 320 

etc acg gag ate cgc cgc ttc tac gtc cag aac gga aag ttg ate ccc 1008 
Leu Thr Glu He Arg Arg Phe Tyr Val Gin Asn Gly Lys Leu He Pro 
325 330 335 

aac tec cag teg aag ate teg ggc gtc act ggc aac tec ate acc tct 1056 
Asn Ser Gin Ser Lys He Ser Gly Val Thr Gly Asn Ser lie Thr Ser 
340 345 350 

get ttc tgc gac get cag aag gcg get ttc ggc gat aac tac acg ttc 1104 
Ala Phe Cys Asp Ala Gin Lys Ala Ala Phe Gly Asp Asn Tyr Thr Phe 
355 360 365 

aag gac aag ggc ggc ttc gca tec atg act act get atg aag aac gga 1152 
Lys Asp Lys Gly Gly Phe Ala Ser Met Thr Thr Ala Met Lys Asn Gly 
370 375 380 

atg gtc ctg gtt atg agt ctt tgg gat gac cac tac gee aat atg etc 1200 
Met Val Leu Val Met Ser Leu Trp Asp Asp His Tyr Ala Asn Met Leu 
385 390 395 * 400 

tgg ctt gat age gac tat ccc act aac gcg gac tec tec aag ccg ggt 1248 
Trp Leu Asp Ser Asp Tyr Pro Thr Asn Ala Asp Ser Ser Lys Pro Gly 
405 410 415 

gtt get cgt ggc acc tgc ccg act tct tec ggc gtg ccc teg gat gtc 1296 
Val Ala Arg Gly Thr Cys Pro Thr Ser Ser Gly Val Pro Ser Asp Val 
420 425 430 

gag act aac aat gca age get teg gtc acg tac tec aac att aga ttt 1344 
Glu Thr Asn Asn Ala Ser Ala Ser Val Thr Tyr Ser Asn He Arg Phe 
435 440 445 

gga gat etc aat tec act tac acc gec cag taa 1377 
Gly Asp Leu Asn Ser Thr Tyr Thr Ala Gin 
450 455 



<210> 40 
<211> 458 
<212> PRT 

<213> Trichophaea saccata 
<400> 40 

Met Gin Arg Leu Leu Val Leu Leu Thr Ser Leu Leu Ala Phe Thr Tyr 
15 10 15 

Gly Gin Gin Val Gly Thr Gin Gin Ala Glu Val His Pro Ser Met Thr 
20 25 30 



45 



WO 03/000941 PCT/DK02/00429 

Trp Gin Gin Cys Thr Lys Ser Gly Gly Cys Thr Thr Lys Asn Gly Lys 
35 40 45 

Val Val He Asp Ala Asn Trp Arg Trp Val His Asn Val Gly Gly Tyr 
50 55 60 

Thr Asn Cys Tyr Thr Gly Asn Thr Trp Asp Ser Ser Leu Cys Pro Asp 
65 70 is 80 

Asp Val Thr Cys Ala Lys Asn Cys Ala Leu Asp Gly Ala Asp Tyr Ser 
85 90 * 95 

Gly Thr Tyr Gly Val Thr Ala Gly Gly Asn Ser Leu Lys Leu Thr Phe 
100 105 110 

Val Thr Lys Gly Gin Tyr Ser Thr Asn Val Gly Ser Arg Leu Tyr Met 
115 120 125 

Leu Ala Asp Asp Ser Thr Tyr Gin Met Tyr Asn Leu Leu Asn Gin Glu 
130 135 140 

Phe Thr Phe Asp Val Asp Val Ser Asn Leu Pro Cys Gly Leu Asn Gly 
145 150 155 ' 160 

Ala Leu Tyr Phe Val Ser Met Asp Lys Asp Gly Gly Met Ser Lys Tyr 
165 170 175 

Ser Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 
180 185 190 

Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Gin Gly Asn Val Glu 
195 200 205 

Gly Trp Lys Pro Ser Ser Asn Asp Ala Asn Ala Gly Val Gly Gly His 
210 215 220 

Gly Ser Cys Cys Ala Glu Met Asp Val Trp Glu Ala Asn Ser He Ser 
225 230 235 240 

Ala Ala Val Thr Pro His Ser Cys Ser Thr Thr Ser Gin Thr Met Cys 
245 250 255 

Asn Gly Abp Ser Cys Gly Gly Thr Tyr Ser Ala Thr Arg Tyr Ala Gly 
260 265 270 

val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asp 
275 280 285 

Thr Thr Phe Tyr Gly Lys Gly Lys Thr Val Asp Thr Ser Ser Lys Phe 
290 295 300 

Thr Val Val Thr Gin Phe He Thr Asp Thr Gly Thr Ala Ser Gly Ser 
305 310 315 320 

Leu Thr Glu He Arg Arg Phe Tyr Val Gin Asn Gly Lys Leu He Pro 
325 330 335 

Asn Ser Gin Ser Lys He Ser Gly Val Thr Gly Asn Ser lie Thr Ser 
340 345 350 

Ala Phe Cys Asp Ala Gin Lys Ala Ala Phe Gly Asp Asn Tyr Thr Phe 
355 360 365 
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Lys Asp Lys Gly Gly Phe Ala Ser Met Thr Thr Ala Met Lys Asn Gly 
370 375 380 

Met Val Leu Val Met Ser Leu Trp Asp Asp His Tyr Ala Asn Met Leu 
385 390 395 400 

Trp Leu Asp Ser Asp Tyr Pro Thr Asn Ala Asp Ser Ser Lys Pro Gly 
405 410 415 

Val Ala Arg Gly Thr Cys Pro Thr Ser Ser Gly Val Pro Ser Asp Val 
420 425 430 

Glu Thr Asn Asn Ala Ser Ala Ser Val Thr Tyr Ser Asn He Arg Phe 
435 440 445 

Gly Asp Leu Asn Ser Thr Tyr Thr Ala Gin 
450 455 



<210> 41 

<211> 1353 

<212> DNA 

c213> Myceliophthora thermophila 

<:220> 

<221> CDS 

<222> (1)..(1353) 

<223> 

<400> 41 

atg aag cag tac etc cag tac etc gcg gcg ace ctg ccc ctg gtg ggc 48 

Met Lys Gin Tyr Leu Gin Tyr Leu Ala Ala Thr Leu Pro Leu Val Gly 

1 5 10 15 

ctg gec acg gec cag cag gcg ggt aac ctg cag ace gag act cac ccc 96 
Leu Ala Thr Ala Gin Gin Ala Gly Asn Leu Gin Thr Glu Thr His Pro 
20 25 30 

agg etc act tgg tec aag tgc acg gec ccg gga tec tgc caa cag gtc 144 
Arg Leu Thr Trp Ser Lys Cys Thr Ala Pro Gly Ser Cys Gin Gin Val 
35 40 45 

aac ggc gag gtc gtc ate gac tec aac tgg cgc tgg gtg cac gac gag 192 
Asn Gly Glu Val Val He Asp Ser Asn Trp Arg Trp Val His Asp Glu 
50 55 60 

aac gcg cag aac tgc tac gac ggc aac cag tgg acc aac get tgc age 240 
Asn Ala Gin Asn Cys Tyr Asp Gly Asn Gin Trp Thr Asn Ala Cys Ser 
65 70 75 80 

tct gee acc gac tgc gec gag aat tgc gcg etc gag ggt gee gac tac 288 
Ser Ala Thr Asp Cys Ala Glu Asn Cys Ala Leu Glu Gly Ala Asp Tyr 
85 90 95 

cag ggc acc tat ggc gee teg acc age ggc aat gee ctg acg etc acc 336 
Gin Gly Thr Tyr Gly Ala Ser Thr Ser Gly Asn Ala Leu Thr Leu Thr 
100 105 HO 

ttc gtc act aag cac gag tac ggc acc aac att ggc teg cgc etc tac 384 
Phe Val Thr Lys His Glu Tyr Gly Thr Asn He Gly Ser Arg Leu Tyr 
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etc atg aac ggc gcg aac aag tac cag atg ttc acc etc aag ggc aac 432 
Leu Met Asn Gly Ala Asn Lys Tyr Gin Met Phe Thr Leu Lys Gly Asn 
130 135 140 

gag ctg gec ttc gac gtc gac etc teg gec gtc gag tgc ggc etc aac 480 
Glu Leu Ala Phe Asp Val Asp Leu Ser Ala Val Glu Cys Gly Leu Asn 
145 150 155 160 

age gec etc tac ttc gtg gec atg gag gag gat ggc ggt gtg teg age 528 
Ser Ala Leu Tyr Phe Val Ala Met Glu Glu Asp Gly Gly Val Ser Ser 
165 170 * 175 

tac ccg acc aac acg gec ggt get aag ttc ggc act ggg tac tgc gac 576 
Tyr Pro Thr Asn Thr Ala Gly Ala Lys Phe Gly Thr Gly Tyr Cys Asp 
180 185 190 

gee caa tgc gca cgc gac etc aag ttc gtc ggc ggc aag ggc aac ate 624 
Ala Gin Cys Ala Arg Asp Leu Lys Phe Val Gly Gly Lys Gly Asn He 
195 200 205 

gag ggc tgg aag ccg tec acc aac gat gec aat gec ggt gtc ggt cct 672 
Glu Gly Trp Lys Pro Ser Thr Asn Asp Ala Asn Ala Gly Val Gly Pro 
210 215 220 

tat ggc ggg tgc tgc get gag ate gac gtc tgg gag teg aac aag tat 720 
Tyr Gly Gly Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Lys Tyr 
225 230 235 240 

get ttc get ttc acc ccg cac ggt tgc gag aac cct aaa tac cac gtc 768 
Ala Phe Ala Phe Thr Pro His Gly Cys Glu Asn Pro Lys Tyr His Val 
245 250 255 

tgc gag acc acc aac tgc ggt ggc acc tac tec gag gac cgc ttc get 816 
Cys Glu Thr Thr Asn Cys Gly Gly Thr Tyr Ser Glu Asp Arg Phe Ala 
260 265 270 

ggt gac tgc gat gee aac ggc tgc gac tac aac ccc tac cgc atg ggc 864 
Gly Asp Cys Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr Arg Met Gly 
275 280 285 

aac cag gac ttc tac ggt ccc ggc ttg acg gtc gat acc age aag aag 912 
Asn Gin Asp Phe Tyr Gly Pro Gly Leu Thr Val Asp Thr Ser Lys Lys 
290 295 300 

ttc acc gtc gtc age cag ttc gag gag aac aag etc acc cag ttc ttc 960 
Phe Thr Val Val Ser Gin Phe Glu Glu Asn Lys Leu Thr Gin Phe Phe 
305 310 315 320 

gtc cag gac ggc aag aag att gag ate ccc ggc ccc aag gtc gag ggc 1008 
Val Gin Asp Gly Lys Lys He Glu He Pro Gly Pro Lys Val Glu Gly 
325 330 335 

ate gat gcg gac age gec get ate acc cct gag ctg tgc agt gec ctg 1056 
He Asp Ala Asp Ser Ala Ala He Thr Pro Glu Leu Cys Ser Ala Leu 
340 345 350 

ttc aag gec ttc gat gac cgt gac cgc ttc teg gag gtt ggc ggc ttc 1104 
Phe Lys Ala Phe Asp Asp Arg Asp Arg Phe Ser Glu Val Gly Gly Phe 
355 360 365 
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gat gcc ate aac acg gec etc age act ccc atg gtc etc gtc atg tec 
Asp Ala lie Asn Thr Ala Leu Ser Thr Pro Met Val Leu Val Met Ser 
370 375 380 



1152 



ate tgg gat gat cac tac gcc aat atg etc tgg etc gac teg age tac 1200 
He Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu Asp Ser Ser Tyr 
385 390 395 400 



ccc cct gag aag get ggc cag cct ggc ggt gac cgt ggc ccg tgt cct 
Pro Pro Glu Lys Ala Gly Gin Pro Gly Gly Asp Arg Gly Pro Cys Pro 
405 410 " * 415 



1248 



cag gac tct ggc gtc ccg gcc gac gtt gag get cag tac cct aat gcc 
Gin Asp Ser Gly Val Pro Ala Asp Val Glu Ala Gin Tyr Pro Asn Ala 
420 425 430 



1296 



aag gtc ate tgg tec aac ate cgc ttc ggc ccc ate ggc teg act gtc 
Lys Val He Trp Ser Asn He Arg Phe Gly Pro He Gly Ser Thr Val 
435 440 ~ 445 



1344 



aac gtc taa 
Asn Val 
450 



1353 



<210> 42 
<211> 450 
<212> PRT 

<213> Myceliophthora thermophila 
<400> 42 

Met Lys Gin Tyr Leu Gin Tyr Leu Ala Ala Thr Leu Pro Leu Val Gly 
15 10 is 

Leu Ala Thr Ala Gin Gin Ala Gly Asn Leu Gin Thr Glu Thr His Pro 
20 25 30 

Arg Leu Thr Trp Ser Lys Cys Thr Ala Pro Gly Ser Cys Gin Gin Val 
35 40 45 

Asn Gly Glu Val Val He Asp Ser Asn Trp Arg Trp Val His Asp Glu 
50 55 60 

Asn Ala Gin Asn Cys Tyr Asp Gly Asn Gin Trp Thr Asn Ala Cys Ser 
65 70 75 80 

Ser Ala Thr Asp Cys Ala Glu Asn Cys Ala Leu Glu Gly Ala Asp Tyr 
85 90 95 

Gin Gly Thr Tyr Gly Ala Ser Thr Ser Gly Asn Ala Leu Thr Leu Thr 
100 105 110 

Phe Val Thr Lys His Glu Tyr Gly Thr Asn He Gly Ser Arg Leu Tyr 
115 120 125 

Leu Met Asn Gly Ala Asn Lys Tyr Gin Met Phe Thr Leu Lys Gly Asn 
130 135 140 



Glu Leu Ala Phe Asp Val Asp Leu Ser Ala Val Glu Cys Gly Leu Asn 
145 150 155 160 
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Ser Ala Leu Tyr Phe Val Ala Met Glu Glu Asp Gly Gly Val Ser Ser 
165 170 175 

Tyr Pro Thr Asn Thr Ala Gly Ala Lys Phe Gly Thr Gly Tyr Cys Asp 
180 185 190 

Ala Gin Cys Ala Arg Asp Leu Lys Phe Val Gly Gly Lys Gly Asn He 
195 200 205 

Glu Gly Trp Lys Pro Ser Thr Asn Asp Ala Asn Ala Gly Val Gly Pro 
210 215 220 

Tyr Gly Gly Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Lys Tyr 
225 230 235 240 

Ala Phe Ala Phe Thr Pro His Gly Cys Glu Asn Pro Lys Tyr His Val 
245 250 255 

Cys Glu Thr Thr Asn Cys Gly Gly Thr Tyr Ser Glu Asp Arg Phe Ala 
260 265 270 

Gly Asp Cys Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr Arg Met Gly 
275 280 285 

Asn Gin Asp Phe Tyr Gly Pro Gly Leu Thr Val Asp Thr Ser Lys Lys 
290 295 300 

Phe Thr Val Val Ser Gin Phe Glu Glu Asn Lys Leu Thr Gin Phe Phe 
305 310 315 320 

Val Gin Asp Gly Lys Lys He Glu He Pro Gly Pro Lys Val Glu Gly 
325 330 " 335 

He Asp Ala Asp Ser Ala Ala He Thr Pro Glu Leu Cys Ser Ala Leu 
340 345 350 

Phe Lys Ala Phe Asp Asp Arg Asp Arg Phe Ser Glu Val Gly Gly Phe 
355 360 365 

Asp Ala He Asn Thr Ala Leu Ser Thr Pro Met Val Leu Val Met Ser 
370 375 380 

He Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu Asp Ser Ser Tyr 
385 390 395 400 

Pro Pro Glu Lys Ala Gly Gin Pro Gly Gly Asp Arg Gly Pro Cys Pro 
405 410 " 415 

Gin Asp Ser Gly Val Pro Ala Asp Val Glu Ala Gin Tyr Pro Asn Ala 
420 425 430 

Lys val He Trp Ser Asn He Arg Phe Gly Pro He Gly Ser Thr Val 
435 440 445 

Asn Val 
450 



<210> 43 
<211> 1341 
<212> DNA 
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<213> Xylaria hypoxylon 
<220> 

<221> CDS 

<222> (1)..(1341) 

<223> 

<400> 43 

atg ttg tec etc gcc gtg teg gcc gcc ctt etc ggg etc gcg tct gcc 48 
Met Leu Ser Leu Ala Val Ser Ala Ala Leu Leu Gly Leu Ala Ser Ala 
1 5 10 15 

cag cag gtt gga aag gag caa tct gag act cac cct aag ctg tct tgg 96 
Gin Gin Val Gly Lys Glu Gin Ser Glu Thr His Pro Lys Leu Ser Trp 
20 25 30 

aag aag tgc acc age ggt ggt tec tgc acc cag acc aac get gag gtg 144 
Lys Lys Cys Thr Ser Gly Gly Ser Cys Thr Gin Thr Asn Ala Glu Val 
35 40 45 

acc ate gac tct aac tgg cga tgg ctt cac tct etc gaa ggc act gag 192 
Thr He Asp Ser Asn Trp Arg Trp Leu His Ser Leu Glu Gly Thr Glu 
50 55 60 

aac tgc tac gat ggt aac aag tgg acc teg cag tgc age act ggc gag 240 
Asn Cys Tyr Asp Gly Asn Lys Trp Thr Ser Gin Cys Ser Thr Gly Glu 
65 70 75 80 

gac tgc gcc acc aag tgc gcc ate gag ggt gcc gac tac age aag acc 288 
Asp Cys Ala Thr Lys Cys Ala He Glu Gly Ala Asp Tyr Ser Lys Thr 
85 90 95 

tac ggt gcc tct act age ggc gat get ctt acc etc aag ttc ctg acc 336 
Tyr Gly Ala Ser Thr Ser Gly Asp Ala Leu Thr Leu Lys Phe Leu Thr 
100 105 110 

aag cac gag tac gga acc aac ate ggc tec cga ttc tac ctt atg aat 384 
Lys His Glu Tyr Gly Thr Asn lie Gly Ser Arg Phe Tyr Leu Met Asn 
115 120 125 

ggt gcc gac aag tac cag acc ttc gac etc aag ggt aac gag ttc acc 432 
Gly Ala Asp Lys Tyr Gin Thr Phe Asp Leu Lys Gly Asn Glu Phe Thr 
130 135 140 

ttc gat gtc gac ctg tec acc gtc gac tgt ggt ctt aac gcc get ctt 480 
Phe Asp val Asp Leu Ser Thr Val Asp Cys Gly Leu Asn Ala Ala Leu 
145 150 155 160 

tac ttc gtc gcc atg gag gaa gac ggt ggc atg get age tac ccc aac 528 
Tyr Phe Val Ala Met Glu Glu Asp Gly Gly Met Ala Ser Tyr Pro Asn 
165 170 175 

aac aag gcc ggt gcc aag tac ggt acc ggt tac tgt gac get cag tgt 576 
Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ala Gin Cys 
180 185 190 

gcc cgt gac ttg aag ttc gtc ggt ggc aag ggc aac gtt gag gga tgg 624 
Ala Arg Asp Leu Lys Phe Val Gly Gly Lys Gly Asn Val Glu Gly Trp 
195 200 205 

gag cca tec acc aac gac gac aac gcc ggt gtt ggc cct tac ggt gcc 672 

51 



WO 03/000941 PC17DK02/00429 

Glu Pro Ser Thr Asn Asp Asp Asn Ala Gly Val Gly Pro Tyr Gly Ala 
210 215 220 

tgc tgt gcc gaa ate gat gtc tgg gag tec aac tct cac tct ttc get 720 
Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Ser His Ser Phe Ala 
225 230 235 240 

ttc acc cct cac cct tgc acc ace aac gaa tac cac gtc tgt gag cag 768 
Phe Thr Pro His Pro Cys Thr Thr Asn Glu Tyr His Val Cys Glu Gin 
245 250 255 

gac gag tgt ggt ggt acc tac tct gag gac cga ttc get ggc aag tgt 816 
Asp Glu Cys Gly Gly Thr Tyr Ser Glu Asp Arg Phe Ala Gly Lys Cys 
260 265 270 

gat gcc aac ggt tgt gac tac aac cct tac cgc atg ggt aac acc gac 864 
Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr Arg Met Gly Asn Thr Asp 
275 280 285 

ttc tac ggc cag ggc aag acc gtc gac acc age aag aaa ttc act gtt 912 
Phe Tyr Gly Gin Gly Lys Thr Val Asp Thr Ser Lys Lys Phe Thr Val 
290 295 300 

gtc acc cag ttc gcc gaa aac aag ttg act cag ttc ttc gtc cag gac 960 
Val Thr Gin Phe Ala Glu Asn Lys Leu Thr Gin Phe Phe Val Gin Asp 
305 310 315 320 

ggt aag aag att gag ate ccc ggt ccc aag att gac ggt ttc cct acc 1008 
Gly Lys Lys He Glu He Pro Gly Pro Lys He Asp Gly Phe Pro Thr 
325 330 335 

gat age gcc ate acc ccc gag tac tgc act gcc gaa ttc aac gtt eta 1056 
Asp Ser Ala lie Thr Pro Glu Tyr Cys Thr Ala Glu Phe Asn Val Leu 
340 345 350 

gga gac cgt gac cgc ttc agt gaa gtt ggt ggc ttc gac cag etc aac 1104 
Gly Asp Arg Asp Arg Phe Ser Glu Val Gly Gly Phe Asp Gin Leu Asn 
355 360 365 

aac get ctt gac gta ccc atg gtc ctt gtc atg tec ate tgg gac gac 1152 
Asn Ala Leu Asp Val Pro Met Val Leu Val Met Ser He Trp Asp Asp 
370 375 380 

cac tac gcc aac atg ctt tgg etc gac tec age tac ccc cct gag aag 1200 
His Tyr Ala Asn Met Leu Trp Leu Asp Ser Ser Tyr Pro Pro Glu Lys 
385 390 395 400 

get ggc cag ccc ggt ggt gac cgt ggt gac tgt gcc ccc gac tec ggt 1248 
Ala Gly Gin Pro Gly Gly Asp Arg Gly Asp Cys Ala Pro Asp Ser Gly 
405 410 415 

gtc cce tec gac gtc gag gcc age ate ccc gat gcc aag gtc gtc tgg 1296 
Val Pro Ser Asp Val Glu Ala Ser He Pro Asp Ala Lys Val Val Trp 
420 425 430 

tec aac ate cgc ttc ggt ccc ate ggc tct act gtc gag gtt taa 1341 
Ser Asn He Arg Phe Gly Pro lie Gly Ser Thr Val Glu Val 
435 440 445 



<210> 44 
<211> 446 
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<212> PRT 

<213> Xylaria hypoxylon 
<400> 44 

Met Leu Ser Leu Ala Val Ser Ala Ala Leu Leu Gly Leu Ala Ser Ala 
1 5 io 15 

Gin Gin Val Gly Lys Glu Gin Ser Glu Thr His Pro Lys Leu Ser Trp 
20 25 30 

Lys Lys Cys Thr Ser Gly Gly Ser Cys Thr Gin Thr Asn Ala Glu Val 
35 40 45 

Thr He Asp Ser Asn Trp Arg Trp Leu His Ser Leu Glu Gly Thr Glu 
50 55 60 

Asn Cys Tyr Asp Gly Asn Lys Trp Thr Ser Gin Cys Ser Thr Gly Glu 
65 70 75 80 

Asp Cys Ala Thr Lys Cys Ala He Glu Gly Ala Asp Tyr Ser Lys Thr 
85 90 95 

Tyr Gly Ala Ser Thr Ser Gly Asp Ala Leu Thr Leu Lys Phe Leu Thr 
100 105 no 

Lys His Glu Tyr Gly Thr Asn He Gly Ser Arg Phe Tyr Leu Met Asn 
115 120 125 

Gly Ala Asp Lys Tyr Gin Thr Phe Asp Leu Lys Gly Asn Glu Phe Thr 
130 135 140 

Phe Asp Val Asp Leu Ser Thr Val Asp Cys Gly Leu Asn Ala Ala Leu 
145 150 155 160 

Tyr Phe Val Ala Met Glu Glu Asp Gly Gly Met Ala Ser Tyr Pro Asn 
165 170 175 

Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ala Gin Cys 
180 185 190 

Ala Arg Asp Leu Lys Phe Val Gly Gly Lys Gly Asn Val Glu Gly Trp 
195 200 205 

Glu Pro Ser Thr Asn Asp Asp Asn Ala Gly Val Gly Pro Tyr Gly Ala 
210 215 220 

Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Ser His Ser Phe Ala 
225 230 235 240 

Phe Thr Pro His Pro Cys Thr Thr Asn Glu Tyr His Val Cys Glu Gin 
245 250 255 

Asp Glu Cys Gly Gly Thr Tyr Ser Glu Asp Arg Phe Ala Gly Lys Cys 
260 265 270 

Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr Arg Met Gly Asn Thr Asp 
275 280 285 

Phe Tyr Gly Gin Gly Lys Thr Val Asp Thr Ser Lys Lys Phe Thr Val 
290 295 300 
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Val Thr Gin Phe Ala Glu Asn Lys Leu Thr Gin Phe Phe Val Gin Asp 
305 310 315 320 

Gly Lys Lys He Glu He Pro Gly Pro Lys He Asp Gly Phe Pro Thr 
325 330 335 

Asp Ser Ala lie Thr Pro Glu Tyr Cys Thr Ala Glu Phe Asn Val Leu 
340 345 350 

Gly Asp Arg Asp Arg Phe Ser Glu Val Gly Gly Phe Asp Gin Leu Asn 
355 360 365 

Asn Ala Leu Asp Val Pro Met Val Leu Val Met Ser He Trp Asp Asp 
370 375 380 

His Tyr Ala Asn Met Leu Trp Leu Asp Ser Ser Tyr Pro Pro Glu Lys 
385 390 395 400 

Ala Gly Gin Pro Gly Gly Asp Arg Gly Asp Cys Ala Pro Asp Ser Gly 
405 410 415 

Val Pro Ser Asp Val Glu Ala Ser He Pro Asp Ala Lys Val Val Trp 
420 425 * 430 

Ser Asn He Arg Phe Gly Pro He Gly Ser Thr Val Glu Val 
435 440 445 



<210> 45 

<211> 1584 

<212> DNA 

<213> Exidia glandulosa 
<220> 

<221> CDS 

<222> (1)..(1584) 

<223> 

<400> 45 

atg tac gcc aag ttc get acc etc get gee etc gtg gca get gec age 48 

Met Tyr Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Ala Ala Ser 
1 5 10 15 

gcc cag cag gca tgc aca etc acc gcc gag aac cat ccc tec atg act 96 
Ala Gin Gin Ala Cys Thr Leu Thr Ala Glu Asn His Pro Ser Met Thr 
20 25 30 

tgg tct aag tgt gcc gcc gga ggt age tgc act teg gtt tct ggt tea 144 
Trp Ser Lys Cys Ala Ala Gly Gly Ser Cys Thr Ser Val Ser Gly Ser 
35 40 45 

gtc acc ate gat gcc aac tgg cga tgg ctt cac cag etc aac age gcc 192 
Val Thr He Asp Ala Asn Trp Arg Trp Leu His Gin Leu Asn Ser Ala 
50 55 60 

acc aac tgc tac gac ggc aac aag tgg aac acc acc tac tgc age aca 240 
Thr Asn Cys Tyr Asp Gly Asn Lys Trp Asn Thr Thr Tyr Cys Ser Thr 
65 70 75 80 

gat get act tgc get get cag tgc tgt gtt gat ggc tea gac tat get 288 
Asp Ala Thr Cys Ala Ala Gin Cys Cys Val Asp Gly Ser Asp Tyr Ala 
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85 90 95 

ggc acc tac ggt gcc acc act age ggt aac get ctg aac etc aag ttc 336 
Gly Thr Tyr Gly Ala Thr Thr Ser Gly Asn Ala Leu Asn Leu Lys Phe 
100 105 no 

gtc acc caa ggg tec tat tct aag aac ate ggt tec egg ttg tac etc 384 
Val Thr Gin Gly Ser Tyr Ser Lys Asn He Gly Ser Arg Leu Tyr Leu 
115 120 125 

atg gag teg gat acc aag tat cag atg ttt caa ctg etc ggc cag gag 432 
Met Glu Ser Asp Thr Lys Tyr Gin Met Phe Gin Leu Leu Gly Gin Glu 
130 135 140 

ttc act ttc gac gta gat gtc tec aac ttg ggc tgc ggt etc aac ggt 480 
Phe Thr Phe Asp Val Asp Val Ser Asn Leu Gly Cys Gly Leu Asn Gly 
145 150 155 160 

gcc etc tac ttc gtc age atg gac get gac ggt ggc acg tec aag tat 528 
Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Thr Ser Lys Tyr 
165 170 " 175 

acc ggc aac aag gcc ggc gcc aag tat ggc act ggc tac tgc gac age 576 
Thr Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 
180 185 190 

cag tgc ccg cgc gac ctg aag ttc ate aat ggt cag gcc aac gtc gag 624 
Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Gin Ala Asn Val Glu 
195 200 205 

ggc tgg act cct tec acc aac gat gcc aac gcc ggc att ggc acc cac 672 
Gly Trp Thr Pro Ser Thr Asn Asp Ala Asn Ala Gly He Gly Thr His 
210 215 220 

ggc tec tgc tgt teg gag atg gac ate tgg gag get aac aat gtt gcc 720 
Gly Ser Cys Cys Ser Glu Met Asp He Trp Glu Ala Asn Asn Val Ala 
225 230 235 240 

get gcg tac acc ccc cat cct tgc aca act ate ggc cag teg ate tgc 768 
Ala Ala Tyr Thr Pro His Pro Cys Thr Thr He Gly Gin Ser He Cys 
245 250 * 255 

teg ggc gat tct tgc gga gga acc tac age tct gac cgt tac gcc ggt 816 
Ser Gly Asp Ser Cys Gly Gly Thr Tyr Ser Ser Asp Arg Tyr Ala Gly 
260 265 270 

gtc tgc gat cca gac ggt tgc gat ttc aac age tac cgc atg ggc gac 864 
Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asp 
275 280 285 

acg ggc ttc tac ggc aag ggc ctg aca gtc gac acg age tec aag ttc 912 
Thr Gly Phe Tyr Gly Lys Gly Leu Thr Val Asp Thr Ser Ser Lys Phe 
290 295 300 

acc gtc gtc acc cag ttc etc acc ggc tec gac ggc aac ctt tec gag 960 
Thr Val Val Thr Gin Phe Leu Thr Gly Ser Asp Gly Asn Leu Ser Glu 
305 310 315 320 

ate aag cgc ttc tac gtc cag aac ggc aag gtc att ccc aac teg cag 1008 
He Lys Arg Phe Tyr Val Gin Asn Gly Lys Val He Pro Asn Ser Gin 
325 330 335 
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tec aag att gec ggc gtc age ggc aac tec ate acc acc gac ttc tgc 1056 
Ser Lys He Ala Gly Val Ser Gly Asn Ser He Thr Thr Asp Phe Cys 
340 345 350 

tec gee cag aag acc gec ttc ggc gac acc aac gtc ttc gcg caa aag 1104 
Ser Ala Gin Lys Thr Ala Phe Gly Asp Thr Asn Val Phe Ala Gin Lys 
355 360 365 

gga ggt etc gec ggg atg ggc gee gee etc aag gec ggc atg gtc etc 1152 
Gly Gly Leu Ala Gly Met Gly Ala Ala Leu Lys Ala Gly Met Val Leu 
370 375 380 

gtc atg tec ate tgg gac gac cac gca gtc aac atg ctg tgg ctg gac 1200 
Val Met Ser He Trp Asp Asp His Ala Val Asn Met Leu Trp Leu Asp 
385 390 395 400 

teg acc tac ccg acc gac age acc aag ccc ggc gcg gec cgc ggc acc 124 8 
Ser Thr Tyr Pro Thr Asp Ser Thr Lys Pro Gly Ala Ala Arg Gly Thr 
405 410 415 

tgc ccg acc acc tec ggc gtc ccc gec gac gtc gag gec cag gtc ccc 1296 
Cys Pro Thr Thr Ser Gly Val Pro Ala Asp Val Glu Ala Gin Val Pro 
420 425 430 

aac teg aac gtc ate tac tec aac ate aag gtc ggc ccc ate aac teg 1344 
Asn Ser Asn Val He Tyr Ser Asn lie Lys Val Gly Pro He Asn Ser 
435 440 445 

act ttc acc ggc ggc act tec ggc ggc ggc ggt age age age age tec 1392 
Thr Phe Thr Gly Gly Thr Ser Gly Gly Gly Gly Ser Ser Ser Ser Ser 
450 455 460 

acc acc ate cga acc age acc acc age act cgc acc acc age acc age 1440 
Thr Thr He Arg Thr Ser Thr Thr Ser Thr Arg Thr Thr Ser Thr Ser 
465 470 475 480 

acc gcg ccc ggc ggc ggc tec act ggc age gec ggc gec gat cac tgg 1488 
Thr Ala Pro Gly Gly Gly Ser Thr Gly Ser Ala Gly Ala Asp His Trp 
485 490 495 

gcg caa tgc ggc ggt ate ggc tgg act ggt ccc acg acc tgc aag age 1536 
Ala Gin Cys Gly Gly He Gly Trp Thr Gly Pro Thr Thr Cys Lys Ser 
500 505 510 

ccg tac acg tgc aca gec tec aac ccg tac tac teg cag tgc ttg taa 1584 
Pro Tyr Thr Cys Thr Ala Ser Asn Pro Tyr Tyr Ser Gin Cys Leu 
515 520 * 525 



<210> 46 
<211> 527 
<212> PRT 

<213> Exidia glandulosa 
<400> 46 

Met Tyr Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Ala Ala Ser 
15 10 15 

Ala Gin Gin Ala Cys Thr Leu Thr Ala Glu Asn His Pro Ser Met Thr 
20 25 30 
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Trp Ser Lys CyB Ala Ala Gly Gly Ser Cys Thr Ser val Ser Gly Ser 
35 40 45 

Val Thr He Asp Ala Asn Trp Arg Trp Leu His Gin Leu Asn Ser Ala 
50 55 60 

Thr Asn Cys Tyr Asp Gly Asn Lys Trp Asn Thr Thr Tyr Cys Ser Thr 
65 70 75 80 

Asp Ala Thr Cys Ala Ala Gin Cys Cys Val Asp Gly Ser Asp Tyr Ala 
85. 90 95 

Gly Thr Tyr Gly Ala Thr Thr Ser Gly Asn Ala Leu Asn Leu Lys Phe 
100 105 110 

Val Thr Gin Gly Ser Tyr Ser Lys Asn He Gly Ser Arg Leu Tyr Leu 
115 120 125 

Met Glu Ser Asp Thr Lys Tyr Gin Met Phe Gin Leu Leu Gly Gin Glu 
130 135 140 

Phe Thr Phe Asp Val Asp Val Ser Asn Leu Gly Cys Gly Leu Asn Gly 
145 150 155 160 

Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Thr Ser Lys Tyr 
165 170 175 

Thr Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 
180 185 190 

Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Gin Ala Asn Val Glu 
195 200 205 

Gly Trp Thr Pro Ser Thr Asn Asp Ala Asn Ala Gly He Gly Thr His 
210 215 220 

Gly Ser Cys Cys Ser Glu Met Asp He Trp Glu Ala Asn Asn Val Ala 
225 230 235 240 

Ala Ala Tyr Thr Pro His Pro Cys Thr Thr lie Gly Gin Ser He Cys 
245 250 255 

Ser Gly Asp Ser Cys Gly Gly Thr Tyr Ser Ser Asp Arg Tyr Ala Gly 
260 265 270 

Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asp 
275 280 285 

Thr Gly Phe Tyr Gly Lys Gly Leu Thr Val Asp Thr Ser Ser Lys Phe 
290 295 300 

Thr Val val Thr Gin Phe Leu Thr Gly Ser Asp Gly Asn Leu Ser Glu 
305 310 315 320 

lie Lys Arg Phe Tyr Val Gin Asn Gly Lys Val He Pro Asn Ser Gin 
325 330 335 

Ser Lys He Ala Gly Val Ser Gly Asn Ser lie Thr Thr Asp Phe Cys 
340 345 350 

Ser Ala Gin Lys Thr Ala Phe Gly Asp Thr Asn Val Phe Ala Gin Lys 
355 360 365 
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Gly Gly Leu Ala Gly Met Gly Ala Ala Leu Lys Ala Gly Met Val Leu 
370 375 380 

Val Met Ser He Trp Asp Asp His Ala Val Asn Met Leu Trp Leu Asp 
385 390 395 * 400 

Ser Thr Tyr Pro Thr Asp Ser Thr Lys Pro Gly Ala Ala Arg Gly Thr 
405 410 415 

Cys Pro Thr Thr Ser Gly Val Pro Ala Asp Val Glu Ala Gin Val Pro 
420 425 430 

Asn Ser Asn Val He Tyr Ser Asn He Lys Val Gly Pro He Asn Ser 
435 440 445 

Thr Phe Thr Gly Gly Thr Ser Gly Gly Gly Gly Ser Ser Ser Ser Ser 
450 455 460 

Thr Thr He Arg Thr Ser Thr Thr Ser Thr Arg Thr Thr Ser Thr Ser 
465 470 475 480 

Thr Ala Pro Gly Gly Gly Ser Thr Gly Ser Ala Gly Ala Asp His Trp 
485 490 " 495 

Ala Gin Cys Gly Gly He Gly Trp Thr Gly Pro Thr Thr Cys Lys Ser 
500 505 510 

Pro Tyr Thr Cys Thr Ala Ser Asn Pro Tyr Tyr Ser Gin Cys Leu 
515 520 525 



<210> 47 

<211> 1368 

<212> DNA 

<213> Exidia glandulosa 
<220> 

<221> CDS 

<222> (1)..(1368) 

<223> 

<400> 47 

atg tac gcc aag ttc get acc etc get gec etc gtg gca get gec age 48 

Met Tyr Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Ala Ala Ser 
1 5 10 15 

gee cag cag gca tgc aca etc acc gcc gag aac cat ccc tec atg act 96 
Ala Gin Gin Ala Cys Thr Leu Thr Ala Glu Asn His Pro Ser Met Thr 
20 25 30 

tgg tct aag tgt gcc gcc gga ggt age tgc act teg gtt tct ggt tea 144 
Trp Ser Lys Cys Ala Ala Gly Gly Ser Cys Thr Ser Val Ser Gly Ser 
35 40 .45 

gtc acc ate gat gcc aac tgg cga tgg ctt cac cag etc aac age gcc 192 
Val Thr lie Asp Ala Asn Trp Arg Trp Leu His Gin Leu Asn Ser Ala 
50 55 60 

acc aac tgc tac gac ggc aac aag tgg aac acc acc tac tgc age aca 240 
Thr Asn Cys Tyr Asp Gly Asn Lys Trp Asn Thr Thr Tyr Cys Ser Thr 
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gat get act tgc get get cag tgc tgt gtt gat ggc tea gac tat get 288 
Asp Ala Thr Cys Ala Ala Gin Cys Cys Val Asp Gly Ser Asp Tyr Ala 
85 90 ' 95 

ggc ace tac ggt gee ace act age ggt aac get ctg aac etc aag ttc 336 
Gly Thr Tyr Gly Ala Thr Thr Ser Gly Asn Ala Leu Asn Leu Lys Phe 
100 105 HO 

gtc ace caa ggg tec tat tct aag aac ate ggt tec egg ttg tac etc 384 
Val Thr Gin Gly Ser Tyr Ser Lys Asn He Gly Ser Arg Leu Tyr Leu 
115 120 125 

atg gag teg gat ace aag tat cag atg ttt caa ctg etc ggc cag gag 432 
Met Glu Ser Asp Thr Lys Tyr Gin Met Phe Gin Leu Leu Gly Gin Glu 
130 135 140 

ttc act ttc gac gta gat gtc tec aac ttg ggc tgc ggt etc aac ggt 480 
Phe Thr Phe Asp Val Asp Val Ser Asn Leu Gly Cys Gly Leu Asn Gly 
145 150 155 160 

gec etc tac ttc gtc age atg gac get gac ggt ggc acg tec aag tat 528 
Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Thr Ser Lys Tyr 
165 170 175 

acc ggc aac aag gec ggc gec aag tat ggc act ggc tac tgc gac age 576 
Thr Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 
180 185 190 

cag tgc ccg cgc gac ctg aag ttc ate aat ggt cag gec aac gtc gag 624 
Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Gin Ala Asn Val Glu 
195 200 205 

ggc tgg act cct tec acc aac gat gec aac gee ggc att ggc acc cac 672 
Gly Trp Thr Pro Ser Thr Asn Asp Ala Asn Ala Gly He Gly Thr His 
210 215 220 

ggc tec tgc tgt teg gag atg gac ate tgg gag get aac aat gtt gec 720 
Gly Ser Cys Cys Ser Glu Met Asp He Trp Glu Ala Asn Asn Val Ala 
225 230 235 240 

get gcg tac acc ccc cat cct tgc aca act ate ggc cag teg ate tgc 768 
Ala Ala Tyr Thr Pro His Pro Cys Thr Thr He Gly Gin Ser He Cys 
245 250 255 

teg ggc gat tct tgc gga gga acc tac age tct gac cgt tac gec ggt 816 
Ser Gly Asp Ser Cys Gly Gly Thr Tyr Ser Ser Asp Arg Tyr Ala Gly 
260 265 270 

gtc tgc gat cca gac ggt tgc gat ttc aac age tac cgc atg ggc gac 864 
Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asp 
275 280 285 

acg ggc ttc tac ggc aag ggc ctg aca gtc gac acg age tec aag ttc 912 
Thr Gly Phe Tyr Gly Lys Gly Leu Thr Val Asp Thr Ser Ser Lys Phe 
290 295 300 

acc gtc gtc acc cag ttc etc acc ggc tec gac ggc aac ctt tec gag 960 
Thr Val Val Thr Gin Phe Leu Thr Gly Ser Asp Gly Asn Leu Ser Glu 
305 310 315 * 320 
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ate aag cgc ttc tac gtc cag aac ggc aag gtc att ccc aac teg cag 1008 
lie Lys Arg Phe Tyr Val Gin Asn Gly Lys Val lie Pro Asn Ser Gin 
325 330 335 

tec aag att gee ggc gtc age ggc aac tec ate ace acc gac ttc tgc 1056 
Ser Lys lie Ala Gly Val Ser Gly Asn Ser He Thr Thr Asp Phe Cys 
340 345 350 

tec gec cag aag acc gec ttc ggc gac acc aac gtc ttc gcg caa aag 1104 
Ser Ala Gin Lys Thr Ala Phe Gly Asp Thr Asn Val Phe Ala Gin Lys 
355 360 365 

gga ggt etc gec ggg atg ggc gee gec etc aag gee ggc atg gtc etc 1152 
Gly Gly Leu Ala Gly Met Gly Ala Ala Leu Lys Ala Gly Met Val Leu 
370 375 380 

gtc atg tec ate tgg gac gat cac tac gec aac atg ctg tgg etc gac 1200 
Val Met Ser He Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu Asp 
385 390 395 400 

teg acc tac ccg act gac gec tct ccc gat gag ccc ggc aag ggc cgc 1248 
Ser Thr Tyr Pro Thr Asp Ala Ser Pro Asp Glu Pro Gly Lys Gly Arg 
405 410 * ' 415 

ggc acc tgc gac acc age teg ggt gtt cct get gac ate gag acc age 1296 
Gly Thr Cys Asp Thr Ser Ser Gly Val Pro Ala Asp He Glu Thr Ser 
420 425 430 

cag gec age aac tea gtc ate tac teg aac ate aag ttc gga ccc ate 1344 
Gin Ala Ser Asn Ser Val He Tyr Ser Asn He Lys Phe Gly Pro He 
435 440 445 

aac teg acc ttc aag gcg tec taa 1368 
Asn Ser Thr Phe Lys Ala Ser 
450 455 



<210> 48 
<211> 455 
<212> PRT 

<213> Exidia glandulosa 
<400> 48 

Met Tyr Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Ala Ala Ser 
15 10 15 

Ala Gin Gin Ala Cys Thr Leu Thr Ala Glu Asn His Pro Ser Met Thr 
20 25 30 

Trp Ser Lys Cys Ala Ala Gly Gly Ser Cys Thr Ser Val Ser Gly Ser 
35 40 45 

Val Thr He Asp Ala Asn Trp Arg Trp Leu His Gin Leu Asn Ser Ala 
50 55 60 

Thr Asn Cys Tyr Asp Gly Asn Lys Trp Asn Thr Thr Tyr Cys Ser Thr 
65 70 75 60 

Asp Ala Thr Cys Ala Ala Gin Cys Cys Val Asp Gly Ser Asp Tyr Ala 
85 90 95 
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Gly Thr Tyr Gly Ala Thr Thr Ser Gly Asn Ala Leu Asn Leu Lys Phe 
100 105 110 

Val Thr Gin Gly Ser Tyr Ser Lys Asn He Gly Ser Arg Leu Tyr Leu 
115 120 125 

Met Glu Ser Asp Thr Lys Tyr Gin Met Phe Gin Leu Leu Gly Gin Glu 
130 135 140 

Phe Thr Phe Asp Val Asp Val Ser Asn Leu Gly Cys Gly Leu Asn Gly 
145 150 155 160 

Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Thr Ser Lys Tyr 
165 170 175 

Thr Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ser 
180 185 190 

Gin Cys Pro Arg Asp Leu Lys Phe He Asn Gly Gin Ala Asn Val Glu 
195 200 205 

Gly Trp Thr Pro Ser Thr Asn Asp Ala Asn Ala Gly He Gly Thr His 
210 215 220 

Gly Ser Cys Cys Ser Glu Met Asp He Trp Glu Ala Asn Asn Val Ala 
225 230 235 240 

Ala Ala Tyr Thr Pro His Pro Cys Thr Thr He Gly Gin Ser He Cys 
245 250 255 

Ser Gly Asp Ser Cys Gly Gly Thr Tyr Ser Ser Asp Arg Tyr Ala Gly 
260 265 270 

Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asp 
275 280 285 

Thr Gly Phe Tyr Gly Lys Gly Leu Thr Val Asp Thr Ser Ser Lys Phe 
290 295 300 

Thr Val Val Thr Gin Phe Leu Thr Gly Ser Asp Gly Asn Leu Ser Glu 
305 310 315 320 

He Lys Arg Phe Tyr Val Gin Asn Gly Lys Val He Pro Asn Ser Gin 
325 330 335 

Ser Lys He Ala Gly Val Ser Gly Asn Ser ' He Thr Thr Asp Phe Cys 
340 345 350 

Ser Ala Gin Lys Thr Ala Phe Gly Asp Thr Asn Val Phe Ala Gin Lys 
355 360 365 

Gly Gly Leu Ala Gly Met Gly Ala Ala Leu Lys Ala Gly Met Val Leu 
370 375 380 

Val Met Ser He Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu Asp 
385 390 395 " 400 

Ser Thr Tyr Pro Thr Asp Ala Ser Pro Asp Glu Pro Gly Lys Gly Arg 
405 410 415 

Gly Thr Cys Asp Thr Ser Ser Gly Val Pro Ala Asp He Glu Thr Ser 
420 425 430 
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Gin Ala Ser Asn Ser Val He Tyr Ser Asn He Lys Phe Gly Pro He 
435 440 445 

Asn Ser Thr Phe Lys Ala Ser 
450 455 



<210> 49 

<211> 1395 

<212> DNA 

<213> Poitrasia circinans 
<220> 

<221> 'CDS 

<222> (1)..(1395) 

<223> 

<400> 49 

atg cat cag act tec gtt ctt tct teg etc tct ttg etc etc gca gcc 48 
Met His Gin Thr Ser Val Leu Ser Ser Leu Ser Leu Leu Leu Ala Ala 
15 10 15 

tec ggt gcc cag cag gtc ggc acc cag aat get gag act cac ccg agt 96 
Ser Gly Ala Gin Gin Val Gly Thr Gin Asn Ala Glu Thr His Pro Ser 
20 25 30 

ctg acc acc cag aag tgt acc acc gac ggc ggc tgc acc gac cag tec 144 
Leu Thr Thr Gin Lys Cys Thr Thr Asp Gly Gly Cys Thr Asp Gin Ser 
35 40 * " 45 

act gcc ate gtg ctt gac gcc aac tgg cgc tgg ctg cac acc acc gag 192 
Thr Ala He Val Leu Asp Ala Asn Trp Arg Trp Leu His Thr Thr Glu 
50 55 60 

ggc tac acc aac tgc tac act ggc cag gaa tgg gac acc gac ate tgc 240 
Gly Tyr Thr Asn Cya Tyr Thr Gly Gin Glu Trp Asp Thr Asp He Cys 
65 70 75 80 

tec tec ccg gag get tgc gcc acc ggc tgc get ctt gac ggt gcc gac 288 
Ser Ser Pro Glu Ala Cys Ala Thr Gly Cys Ala Leu Asp Gly Ala Asp 
85 90 95 

tac gag ggc act tac ggc att acg act gac ggc aac get ctt tec atg 336 
Tyr Glu Gly Thr Tyr Gly He Thr Thr Asp Gly Asn Ala Leu Ser Met 
100 105 no 

aag ttt gtc acc cag ggc teg cag aag aac gtc ggc ggt cgt gtt tac 384 
Lys Phe Val Thr Gin Gly Ser Gin Lys Asn Val Gly Gly Arg Val Tyr 
115 120 125 

ctg ctt get ccc gac tec gaa gat gcg tac gag etc ttc aag ttg aag 432 
Leu Leu Ala Pro Asp Ser Glu Asp Ala Tyr Glu Leu Phe Lys Leu Lys 
130 135 140 

aac cag gag ttc act ttc gac gtt gac gtc tec gac etc ccc tgc ggc 480 
Asn Gin Glu Phe Thr Phe Asp Val Asp Val Ser Asp Leu Pro Cys Gly 
145 150 155 * 160 

ctg aac ggc gcc ctg tac ttc tec gag atg gat gaa gat ggt ggc atg 528 
Leu Asn Gly Ala Leu Tyr Phe Ser Glu Met Asp Glu Asp Gly Gly Met 
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165 170 175 

tec aag tac gag aac aac aag gec ggc gec aag tac ggc act ggc tac 576 
Ser Lys Tyr Glu Asn Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr 
180 185 190 

tgc gac acg cag tgc ccc cac gac gtc aag ttc ate aac ggc gag gec 624 
Cys Asp Thr Gin Cys Pro His Asp Val Lys Phe He Asn Gly Glu Ala 
195 200 205 

aac att etc aac tgg ace aag tec gag ace gac gtc aac gee ggc act 672 
Asn He Leu Asn Trp Thr Lys Ser Glu Thr Asp Val Asn Ala Gly Thr 
210 215 220 

ggc caa tac ggc tec tgc tgc aac gag atg gat ate tgg gag gec aac 720 
Gly Gin Tyr Gly Ser Cys Cys Asn Glu Met Asp He Trp Glu Ala Asn 
225 230 235 240 

teg cag gec acc gec gtc act ccc cac gtc tgc aac gec gat gtc ate 768 
Ser Gin Ala Thr Ala Val Thr Pro His Val Cys Asn Ala Asp Val He 
245 250 ~ 255 

ggc cag gtc cgt tgc aac ggc acc gac tgc ggt gac ggc gac aac cgc 816 
Gly Gin Val Arg Cys Asn Gly Thr Asp Cys Gly Asp Gly Asp Asn Arg 
260 265 270 

tac ggc ggc gtc tgc gac aag gat ggc tgc gac tac aac ccc tac cgc 864 
Tyr Gly Gly Val Cys Asp Lys Asp Gly Cys Asp Tyr Asn Pro Tyr Arg 
275 280 285 

atg ggc aac gag teg ttc tac ggc tec aac ggc age acc ate gac acc 912 
Met Gly Asn Glu Ser Phe Tyr Gly Ser Asn Gly Ser Thr He Asp Thr 
290 295 300 

act gee aag ttc acc gtc att acg cag ttc ate acc teg gac aac act 960 
Thr Ala Lys Phe Thr Val He Thr Gin Phe He Thr Ser Asp Asn Thr 
305 310 315 * 320 

teg act ggc gac etc gtt gag ate cgc cgc aag tac gtc cag gac ggc 1008 
Ser Thr Gly Asp Leu Val Glu He Arg Arg Lys Tyr Val Gin Asp Gly 
325 330 335 

acc gtc ate gag aac teg ttc gee gac tac gac acc ctg gec acg ttc 1056 
Thr Val He Glu Asn Ser Phe Ala Asp Tyr Asp Thr Leu Ala Thr Phe 
340 345 350 

aac tec ate teg gac gac ttc tgc gac gee cag aag acg etc ttc ggc 1104 
Asn Ser He Ser Asp Asp Phe Cys Asp Ala Gin Lys Thr Leu Phe Gly 
355 360 365 

gac gag aac gac ttc aag acc aag ggc ggc att gec cgc atg ggc gag 1152 
Asp Glu Asn Asp Phe Lys Thr Lys Gly Gly He Ala Arg Met Gly Glu 
370 375 380 

tec ttc gag cgc ggc atg gtc etc gtc atg age ate tgg gat gac cac 1200 
Ser Phe Glu Arg Gly Met Val Leu Val Met Ser He Trp Asp Asp His 
385 390 395 400 

gcg gec aac gec etc tgg etc gac teg acc tac ccc gtc gac ggc gac 1248 
Ala Ala Asn Ala Leu Trp Leu Asp Ser Thr Tyr Pro Val Asp Gly Asp 
405 410 415 
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gcg acc aag cct ggc ate aag cgc ggc cct tgc ggc acc gac act ggt 1296 
Ala Thr Lys Pro Gly He Lys Arg Gly Pro Cys Gly Thr Asp Thr Gly 
420 425 430 

gtt ccc gec gac gtc gag teg gag teg ccc gat teg acc gtc ate tac 1344 
Val Pro Ala Asp Val Glu Ser Glu Ser Pro Asp Ser Thr Val He Tyr 
435 440 445 

tec aac att cgc tac gga gac att ggc tec acc ttc aac gec acc get 1392 
Ser Asn He Arg Tyr Gly Asp He Gly Ser Thr Phe Asn Ala Thr Ala 
450 455 460 

tag 1395 



<210> 50 

<211> 464 

<212> PRT 

<213> Poitrasia circinans 

<400> 50 

Met His Gin Thr Ser Val Leu Ser Ser Leu Ser Leu Leu Leu Ala Ala 
1 5 10 15 

Ser Gly Ala Gin Gin Val Gly Thr Gin Asn Ala Glu Thr His Pro Ser 
20 25 30 

Leu Thr Thr Gin Lys Cys Thr Thr Asp Gly Gly Cys Thr Asp Gin Ser 
35 40 * 45 

Thr Ala He Val Leu Asp Ala Asn Trp Arg Trp Leu His Thr Thr Glu 
50 55 * 60 

Gly Tyr Thr Asn Cys Tyr Thr Gly Gin Glu Trp Asp Thr Asp He Cys 
65 70 75 80 

Ser Ser Pro Glu Ala Cys Ala Thr Gly Cys Ala Leu Asp Gly Ala Asp 
85 90 95 

Tyr Glu Gly Thr Tyr Gly He Thr Thr Asp Gly Asn Ala Leu Ser Met 
100 105 no 

Lys Phe Val Thr Gin Gly Ser Gin Lys Asn Val Gly Gly Arg Val Tyr 
115 120 125 

Leu Leu Ala Pro Asp Ser Glu Asp Ala Tyr Glu Leu Phe Lys Leu Lys 
130 135 140 

Asn Gin Glu Phe Thr Phe Asp Val Asp Val Ser Asp Leu Pro Cys Gly 
145 150 155 ' 160 

Leu Asn Gly Ala Leu Tyr Phe Ser Glu Met Asp Glu Asp Gly Gly Met 
165 170 175 

Ser Lys Tyr Glu Asn Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr 
180 185 190 

Cys Asp Thr Gin Cys Pro His Asp Val Lys Phe He Asn Gly Glu Ala 
195 200 205 
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Asn He Leu Asn Trp Thr Lys Ser Glu Thr Asp Val Asn Ala Gly Thr 
210 215 220 

Gly Gin Tyr Gly Ser Cys Cys Asn Glu Met Asp He Trp Glu Ala Asn 
225 230 235 240 

Ser Gin Ala Thr Ala Val Thr Pro His Val Cys Asn Ala Asp Val He 
245 250 255 

Gly Gin Val Arg Cys Asn Gly Thr Asp Cys Gly Asp Gly Asp Asn Arg 
260 265 270 

Tyr Gly Gly Val Cys Asp Lys Asp Gly Cys Asp Tyr Asn Pro Tyr Arg 
275 280 285 

Met Gly Asn Glu Ser Phe Tyr Gly Ser Asn Gly Ser Thr He Asp Thr 
290 295 300 

Thr Ala Lys Phe Thr Val He Thr Gin Phe He Thr Ser Asp Asn Thr 
305 310 315 320 

Ser Thr Gly Asp Leu Val Glu He Arg Arg Lys Tyr Val Gin Asp Gly 
325 330 ~ 335 

Thr Val He Glu Asn Ser Phe Ala Asp Tyr Asp Thr Leu Ala Thr Phe 
340 345 350 

Asn Ser He Ser Asp Asp Phe Cys Asp Ala Gin Lys Thr Leu Phe Gly 
355 360 365 

Asp Glu Asn Asp Phe Lys Thr Lys Gly Gly He Ala Arg Met Gly Glu 
370 375 380 

Ser Phe Glu Arg Gly Met Val Leu Val Met Ser He Trp Asp Asp His 
385 390 395 *" * 400 

Ala Ala Asn Ala Leu Trp Leu Asp Ser Thr Tyr Pro Val Asp Gly Asp 
405 410 415 

Ala Thr Lys Pro Gly He Lys Arg Gly Pro Cys Gly Thr Asp Thr Gly 
420 425 430 

Val Pro Ala Asp Val Glu Ser Glu Ser Pro Asp Ser Thr Val He Tyr 
435 440 445 

Ser Asn He Arg Tyr Gly Asp He Gly Ser Thr Phe Asn Ala Thr Ala 
450 455 460 



<210> 51 

<2H> 1383 

<212> DNA 

<213> Coprinus cinereus 
<220> 

<221> CDS 

<222> (1)..(1383) 

<223> 

<400> 51 



atg ttc aag aaa gtc gcc etc acc get etc tgc ttc etc gec gtc gca 

65 
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Met Phe Lys Lys Val Ala Leu Thr Ala Leu Cys Phe Leu Ala Val Ala 
1 5 io 15 

cag gcc caa cag gtc ggt cgc gaa gtc get gaa aac cac ccc cgt etc 96 
Gin Ala Gin Gin Val Gly Arg Glu Val Ala Glu Asn His Pro Arg Leu 
20 25 . 30 

ccg tgg cag cgt tgc act cgc aac ggc gga tgc cag act gtc tec aac 144 
Pro Trp Gin Arg Cys Thr Arg Asn Gly Gly Cys Gin Thr Val Ser Asn 
35 40 45 

ggt cag gtc gtc etc gac gcc aac tgg cga tgg etc cac gtc acc gac 192 
Gly Gin Val Val Leu Asp Ala Asn Trp Arg Trp Leu His Val Thr Asp 
50 55 60 

ggc tac acc aac tgc tac acc ggt aac tec tgg aac age acc gtc tgc 240 
Gly Tyr Thr Asn Cys Tyr Thr Gly Asn Ser Trp Asn Ser Thr Val Cys 
65 70 75 80 

tec gac ccc acc acc tgc get cag cga tgc get etc gag ggt gcc aac 288 
Ser Asp Pro Thr Thr Cys Ala Gin Arg Cys Ala Leu Glu Gly Ala Asn 
85 90 95 

tac cag caa acc tac ggt ate acc acc aac gga gac gcc etc acc ate 336 
Tyr Gin Gin Thr Tyr Gly He Thr Thr Asn Gly Asp Ala Leu Thr He 
100 105 110 

aag ttc etc acc cga tec caa caa acc aac gtc ggt get cgt gtc tac 384 
Lys Phe Leu Thr Arg Ser Gin Gin Thr Asn Val Gly Ala Arg Val Tyr 
115 120 125 

etc atg gag aac gag aac cga tac cag atg ttc aac etc etc aac aag 432 
Leu Met Glu Asn Glu Asn Arg Tyr Gin Met Phe Asn Leu Leu Asn Lys 
130 135 140 

gag ttc acc ttc gac gtt gac gtc tec aag gtt cct tgc ggt ate aac 480 
Glu Phe Thr Phe Asp Val Asp Val Ser Lys Val Pro Cys Gly He Asn 
145 150 155 ' ' 160 

ggt gcc etc tac ttc ate cag atg gac gcc gat ggt ggt atg age aag 528 
Gly Ala Leu Tyr Phe He Gin Met Asp Ala Asp Gly Gly Met Ser Lys 
165 170 * 175 

caa ccc aac aac agg get ggt get aag tac ggt acc ggc tac tgc gac 576 
Gin Pro Asn Asn Arg Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 
180 185 190 

tct cag tgc ccc cgt gac ate aag ttc att gac ggc gtg gcc aac age 624 
Ser Gin Cys Pro Arg Asp He Lys Phe He Asp Gly Val Ala ABn Ser 
195 200 205 

gcc gac tgg act cca tec gag acc gat ccc aat gcc gga agg ggt cgc 672 
Ala Asp Trp Thr Pro Ser Glu Thr Asp Pro Asn Ala Gly Arg Gly Arg 
210 215 220 

tac ggc att tgc tgc gcc gag atg gat ate tgg gag gcc aac tec ate 720 
Tyr Gly He Cys Cys Ala Glu Met Asp He Trp Glu Ala Asn Ser He 
225 230 235 240 

tec aat gcc tac acc ccc cac cct tgc cga acc cag aac gat ggt ggc 768 
Ser Asn Ala Tyr Thr Pro His Pro Cys Arg Thr Gin Asn Asp Gly Gly 
245 250 255 
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tac cag cgc tgc gag ggc cgc gac tgc aac cag cct cgc tat gag ggt 816 
Tyr Gin Arg Cys Glu Gly Arg Asp Cys Asn Gin Pro Arg Tyr Glu Gly 
260 265 270 

ctt tgc gat cct gat ggc tgt gac tac aac ccc ttc cgc atg ggt aac 864 
Leu Cys Asp Pro Asp Gly Cys Asp Tyr Asn Pro Phe Arg Met Gly Asn 
275 280 285 

aag gac ttc tac gga ccc gga aag acc gtc gac acc aac agg aag atg 912 
Lys Asp Phe Tyr Gly Pro Gly Lys Thr Val Asp Thr Asn Arg Lys Met 
290 295 300 

acc gtc gtc acc caa ttc ate acc cac gac aac acc gac act ggc acc 960 
Thr Val Val Thr Gin Phe lie Thr His Asp Asn Thr Asp Thr Gly Thr 
305 310 315 & 320 

etc gtt gac ate cgc cgc etc tac gtt caa gac ggc cgt gtc att gec 1008 
Leu Val Asp He Arg Arg Leu Tyr Val Gin Asp Gly Arg val He Ala 
325 330 335 

aac cct ccc acc aac ttc ccc ggt etc atg ccc gec cac gac tec ate 1056 
Asn Pro Pro Thr Asn Phe Pro Gly Leu Met Pro Ala His Asp Ser He 
340 345 350 

acc gag cag ttc tgc act gac cag aag aac etc ttc ggc gac tac age 1104 
Thr Glu Gin Phe Cys Thr Asp Gin Lys Asn Leu Phe Gly Asp Tyr Ser 
355 360 365 

age ttc get cgt gac ggt ggt etc get cac atg ggt cgc tec etc gec 1152 
Ser Phe Ala Arg Asp Gly Gly Leu Ala His Met Gly Arg Ser Leu Ala 
370 375 380 

aag ggt cac gtc etc get etc tec ate tgg aac gac cac ggt gec cac 1200 
Lys Gly His Val Leu Ala Leu Ser He Trp Asn Asp His Gly Ala His 
385 390 395 400 

atg ttg tgg etc gac tec aac tac ccc acc gac get gac ccc aac aag 1248 
Met Leu Trp Leu Asp Ser Asn Tyr Pro Thr Asp Ala Asp Pro Asn Lys 
405 410 415 

ccc ggt att get cgt ggt acc tgc ccg acc act ggt ggc acc ccc cgt 1296 
Pro Gly He Ala Arg Gly Thr Cys Pro Thr Thr Gly Gly Thr Pro Arg 
420 425 430 

gaa acc gaa caa aac cac cct gat gec cag gtc ate ttc tec aac att 1344 
Glu Thr Glu Gin Asn His Pro Asp Ala Gin Val He Phe Ser Asn He 
435 440 445 

aaa ttc ggt gac ate ggc teg act ttc tct ggt tac taa 1383 
Lys Phe Gly Asp He Gly Ser Thr Phe Ser Gly Tyr 
450 455 460 



<210> 52 
<211> 460 
<212> PRT 

<213> Coprinus cinereus 
<400> 52 

Met Phe Lys Lys Val Ala Leu Thr Ala Leu Cys Phe Leu Ala Val Ala 
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1 



5 



10 



Gin Ala Gin Gin Val Gly Arg Glu Val Ala Glu Asn His Pro Arg Leu 
20 25 30 

Pro Trp Gin Arg Cys Thr Arg Asn Gly Gly Cys Gin Thr Val Ser Asn 
35 40 45 

Gly Gin val val Leu Asp Ala Asn Trp Arg Trp Leu His Val Thr Asp 
50 55 60 

Gly Tyr Thr Asn Cys Tyr Thr Gly Asn Ser Trp Asn Ser Thr Val Cys 
65 70 75 80 

Ser Asp Pro Thr Thr Cys Ala Gin Arg Cys Ala Leu Glu Gly Ala Asn 
85 90 95 

Tyr Gin Gin Thr Tyr Gly lie Thr Thr Asn Gly Asp Ala Leu Thr He 
100 105 110 

Lys Phe Leu Thr Arg Ser Gin Gin Thr Asn Val Gly Ala Arg Val Tyr 
115 120 125 

Leu Met Glu Asn Glu Asn Arg Tyr Gin Met Phe Asn Leu Leu Asn Lys 
130 135 140 

Glu Phe Thr Phe Asp Val Asp Val Ser Lys Val Pro Cys Gly lie Asn 
145 150 155 * " 160 

Gly Ala Leu Tyr Phe He Gin Met Asp Ala Asp Gly Gly Met Ser Lys 
165 170 175 

Gin Pro Asn Asn Arg Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 
180 185 190 

Ser Gin Cys Pro Arg Asp He Lys Phe lie Asp Gly Val Ala Asn Ser 
195 200 205 

Ala Asp Trp Thr Pro Ser Glu Thr Asp Pro Asn Ala Gly Arg Gly Arg 
210 215 220 

Tyr Gly He Cys Cys Ala Glu Met Asp lie Trp Glu Ala Asn Ser He 
225 230 235 240 

Ser Asn Ala Tyr Thr Pro His Pro Cys Arg Thr Gin Asn Asp Gly Gly 
245 250 255 

Tyr Gin Arg Cys Glu Gly Arg Asp Cys Asn Gin Pro Arg Tyr Glu Gly 
260 265 270 

Leu Cys Asp Pro Asp Gly Cys Asp Tyr Asn Pro Phe Arg Met Gly Asn 
275 280 285 

Lys Asp Phe Tyr Gly Pro Gly Lys Thr Val Asp Thr Asn Arg Lys Met 
290 295 300 

Thr Val Val Thr Gin Phe He Thr His Asp Asn Thr Asp Thr Gly Thr 
305 310 315 320 

Leu Val Asp He Arg Arg Leu Tyr Val Gin Asp Gly Arg Val He Ala 



325 



330 



335 
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Asn Pro Pro Thr Asn Phe Pro Gly Leu Met Pro Ala His Asp Ser lie 
340 345 350 

Thr Glu Gin Phe Cys Thr Asp Gin Lys Asn Leu Phe Gly Asp Tyr Ser 
355 360 365 

Ser Phe Ala Arg Asp Gly Gly Leu Ala His Met Gly Arg Ser Leu Ala 
370 375 380 

Lys Gly His Val Leu Ala Leu Ser lie Trp Asn Asp His Gly Ala His 
385 390 395 * " 400 

Met Leu Trp Leu Asp Ser Asn Tyr Pro Thr Asp Ala Asp Pro Asn Lys 
405 410 415 

Pro Gly He Ala Arg Gly Thr Cys Pro Thr Thr Gly Gly Thr Pro Arg 
420 425 430 

Glu Thr Glu Gin Asn His Pro Asp Ala Gin Val He Phe Ser Asn He 
435 440 445 

Lys Phe Gly Asp He Gly Ser Thr Phe Ser Gly Tyr 
450 455 460 



<210> 53 

<211> 1353 

<212> DNA 

<213> Acremonium sp. 
<220> 

<221> CDS 

<222> (1)..(1353) 

<223> 

<400> 53 

atg atg aag cag tat ctt cag tac ctg gcg gcg get ctg ccc eta atg 48 
Met Met Lys Gin Tyr Leu Gin Tyr Leu Ala Ala Ala Leu Pro Leu Met 
15 10 15 

ggc ctt gec gcg ggc cag caa gec ggc egg gag acg ccc gaa aac cac 96 
Gly Leu Ala Ala Gly Gin Gin Ala Gly Arg Glu Thr Pro Glu Asn His 
20 25 30 

ccc egg etc acc tgg aag aag tgc teg ggc cag ggg tec tgc cag acc 144 
Pro Arg Leu Thr Trp Lys Lys Cys Ser Gly Gin Gly Ser Cys Gin Thr 
35 40 45 

gtc aac ggc gag gtc gtc att gat gee aac tgg cgc tgg etc cac gac 192 
Val Asn Gly Glu Val Val He Asp Ala Asn Trp Arg Trp Leu His Asp 
50 55 60 

tec aac atg cag aac tgc tac gac ggc aac cag tgg acc age gcg tgc 240 
Ser Asn Met Gin Asn Cys Tyr Asp Gly Asn Gin Trp Thr Ser Ala Cys 
65 70 75 80 

age teg gee acc gac tgc gee tec aag tgc tac ate gag ggt gec gac 288 
Ser Ser Ala Thr Asp Cys Ala Ser Lys Cys Tyr He Glu Gly Ala Asp 
85 90 95 

tac ggc agg acc tac ggc get teg acg age ggc gac tec etc acg etc 336 
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Tyr Gly Arg Thr Tyr Gly Ala Ser Thr Ser Gly Asp Ser Leu Thr Leu 
100 105 110 

aag ttt gtc act cag cac gag tac ggt acc aac ate ggc teg cgc ttc 384 
Lys Phe Val Thr Gin His Glu Tyr Gly Thr Asn He Gly Ser Arg Phe 
115 120 125 

tac ctg atg age age ccg acc egg tac cag atg ttc acc etc atg aac 432 
Tyr Leu Met Ser Ser Pro Thr Arg Tyr Gin Met Phe Thr Leu Met Asn 
130 135 140 

aac gaa ttt get ttc gat gtc gac etc teg acc gtc gag tgc ggc ate 480 
Asn Glu Phe Ala Phe Asp Val Asp Leu Ser Thr Val Glu Cys Gly He 
145 150 155 160 

aac age gee ctg tac ttc gtc gee atg gag gag gac ggc ggc atg gee 528 
Asn Ser Ala Leu Tyr Phe Val Ala Met Glu Glu Asp Gly Gly Met Ala 
165 170 '* 175 

age tac ccc acc aac aag gee gga gee aag tac ggc acg ggt tac tgc 576 
Ser Tyr Pro Thr Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys 
180 185 ' 190 

gac gec caa tgc gee cgt gat etc aag ttc gtc ggc ggc aag gee aac 624 
Asp Ala Gin Cys Ala Arg Asp Leu Lys Phe Val Gly Gly Lys Ala Asn 
195 200 205 

att gag ggc tgg agg ccg tec acc aac gac gcg aac gee ggc gtc ggc 672 
He Glu Gly Trp Arg Pro Ser Thr Asn Asp Ala Asn Ala Gly Val Gly 
210 215 220 

ccg atg ggc ggc tgc tgc gcg gaa ate gat gtt tgg gag tec aac gee 720 
Pro Met Gly Gly Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Ala 
225 230 235 240 

cac get ttt gee ttc acg ccg cac gcg tgc gag aac aac aac tac cac 768 
His Ala Phe Ala Phe Thr Pro His Ala Cys Glu Asn Asn Asn Tyr His 
245 250 255 

ate tgc gag acc tec aac tgc ggc ggt acc tac tec gac gac cgc ttc 816 
lie Cys Glu Thr Ser Asn Cys Gly Gly Thr Tyr Ser Asp Asp Arg Phe 
260 265 270 

gee ggc etc tgc gac gee aac ggc tgc gac tac aac ccg tac cgc atg 864 
Ala Gly Leu Cys Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr Arg Met 
275 280 285 

ggc aac ccc gac ttc tac ggc aag ggc aag act ctt gac acc teg egg 912 
Gly Asn Pro Asp Phe Tyr Gly Lys Gly Lys Thr Leu Asp Thr Ser Arg 
290 295 300 

aag ttc acc gtc gtc acc cgc ttc cag gag aac gac etc teg cag tac 960 
Lys Phe Thr Val Val Thr Arg Phe Gin Glu Asn Asp Leu Ser Gin Tyr 
305 310 315 320 

ttc ate cag gac ggc cgc aag ate gag ate ccg ccc ccg acc tgg gac 1008 
Phe He Gin Asp Gly Arg Lys He Glu He Pro Pro Pro Thr Trp Asp 
325 330 335 

ggc etc ccg aag age age cac ate acg ccc gag ctg tgc gcg acc cag 1056 
Gly Leu Pro Lys Ser Ser His He Thr Pro Glu Leu Cys Ala Thr Gin 
340 345 350 
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ttc gac gtc ttc gac gac cgc aac cgc ttc gag gag gtc ggc ggc ttc 1104 
Phe Asp Val Phe Asp Asp Arg Asn Arg Phe Glu Glu Val Gly Gly Phe 
355 360 365 

ccc gcc etc aac gec get etc cgc ate ccc atg gtc ctt gtc atg tec 1152 
Pro Ala Leu Asn Ala Ala Leu Arg lie Pro Met Val Leu Val Met Ser 
370 375 380 

ate tgg gac gac cac tac gcc aac atg etc tgg etc gac tec gtc tac 1200 
lie Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu Asp Ser Val Tyr 
385 390 395 400 

ccg ccc gag aag gag ggc acc ccc ggc gcc gag cgt ggc cct tgc ccc 1248 
Pro Pro Glu Lys Glu Gly Thr Pro Gly Ala Glu Arg Gly Pro Cys Pro 
405 410 " 415 

cag acc tct ggt gtc ccc gcc gaa gtc gag gcc cag tac ccc aac gcc 1296 
Gin Thr Ser Gly Val Pro Ala Glu Val Glu Ala Gin Tyr Pro Asn Ala 
420 425 430 

aag gtc gtc tgg tec aac ate cgc ttc ggc ccc ate ggc teg acc tac 1344 
Lys Val Val Trp Ser Asn He Arg Phe Gly Pro He Gly Ser Thr Tyr 
435 440 445 

aac atg taa 1353 
Asn Met 
450 



<210> 54 

<211> 450 

<212> PRT 

<213> Acremonium sp. 

<400> 54 

Met Met Lys Gin Tyr Leu Gin Tyr Leu Ala Ala Ala Leu Pro Leu Met 
1 5 10 15 

Gly Leu Ala Ala Gly Gin Gin Ala Gly Arg Glu Thr Pro Glu Asn His 
20 25 30 

Pro Arg Leu Thr Trp Lys Lys Cys Ser Gly Gin Gly Ser Cys Gin Thr 
35 40 45 

Val Asn Gly Glu Val Val He Asp Ala Asn Trp Arg Trp Leu His Asp 
50 55 60 

Ser Asn Met Gin Asn Cys Tyr Asp Gly Asn Gin Trp Thr Ser Ala Cys 
65 70 75 80 

Ser Ser Ala Thr Asp Cys Ala Ser Lys Cys Tyr He Glu Gly Ala Asp 
85 90 95 

Tyr Gly Arg Thr Tyr Gly Ala Ser Thr Ser Gly Asp Ser Leu Thr Leu 
100 105 * no 

Lys Phe Val Thr Gin His Glu Tyr Gly Thr Asn He Gly Ser Arg Phe 
115 120 125 

Tyr Leu Met Ser Ser Pro Thr Arg Tyr Gin Met Phe Thr Leu Met Asn 
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Asn Glu Phe Ala Phe Asp Val Asp Leu Ser Thr Val Glu Cys Gly He 
145 150 155 160 

Asn Ser Ala Leu Tyr Phe Val Ala Met Glu Glu Asp Gly Gly Met Ala 
165 170 * 175 

Ser Tyr Pro Thr Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cye 
180 185 190 

Asp Ala Gin Cys Ala Arg Asp Leu Lys Phe Val Gly Gly Lys Ala Asn 
195 200 205 

He Glu Gly Trp Arg Pro Ser Thr Asn Asp Ala Asn Ala Gly Val Gly 
210 215 220 

Pro Met Gly Gly Cys Cys Ala Glu He Asp Val Trp Glu Ser Asn Ala 
225 230 * 235 240 

His Ala Phe Ala Phe Thr Pro His Ala Cys Glu Asn Asn Asn Tyr His 
245 250 255 

He Cys Glu Thr Ser Asn Cys Gly Gly Thr Tyr Ser Asp Asp Arg Phe 
260 265 270 

Ala Gly Leu Cys Asp Ala Asn Gly Cys Asp Tyr Asn Pro Tyr Arg Met 
275 280 285 

Gly Asn Pro Asp Phe Tyr Gly Lys Gly Lys Thr Leu Asp Thr Ser Arg 
290 295 300 

Lys Phe Thr Val Val Thr Arg Phe Gin Glu Asn Asp Leu Ser Gin Tyr 
305 310 315 320 

Phe He Gin Asp Gly Arg Lys He Glu He Pro Pro Pro Thr Trp Asp 
325 330 335 

Gly Leu Pro Lys Ser Ser His He Thr Pro Glu Leu Cys Ala Thr Gin 
340 345 ' 350 

Phe Asp Val Phe Asp Asp Arg Asn Arg Phe Glu Glu Val Gly Gly Phe 
355 360 365 

Pro Ala Leu Asn Ala Ala Leu Arg He Pro Met Val Leu Val Met Ser 
370 375 380 

He Trp Asp Asp His Tyr Ala Asn Met Leu Trp Leu Asp Ser Val Tyr 
385 390 395 400 

Pro Pro Glu Lys Glu Gly Thr Pro Gly Ala Glu Arg Gly Pro Cys Pro 
405 410 415 

Gin Thr Ser Gly Val Pro Ala Glu Val Glu Ala Gin Tyr Pro Asn Ala 
420 425 430 

Lys Val Val Trp Ser Asn He Arg Phe Gly Pro He Gly Ser Thr Tyr 
435 440 445 

Asn Met 
450 
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<210> 55 

<211> 1599 

<212> DNA 

<213> Chaetomidium pingtungium 
<220> 

<221> CDS 

<222> (1)..(1599) 

<223> 

<400> 55 

atg ctg gcc tec acc ttc tec tac cgc atg tac aag acc gcg etc ate 48 
Met Leu Ala Ser Thr Phe Ser Tyr Arg Met Tyr Lys Thr Ala Leu He 
1 5 10 15 

ctg gcc gcc ctt ctg ggc tct ggc cag get cag cag gtc ggt act tec 96 
Leu Ala Ala Leu Leu Gly Ser Gly Gin Ala Gin Gin Val Gly Thr Ser 
20 25 30 

cag gcg gaa gtg cat ccg tec atg acc tgg cag age tgc acg get ggc 144 
Gin Ala Glu Val His Pro Ser Met Thr Trp Gin Ser Cys Thr Ala Gly 
35 40 45 

ggc age tgc acc acc aac aac ggc aag gtg gtc ate gac gcg aac tgg 192 
Gly Ser Cys Thr Thr Asn Asn Gly Lys Val Val He Asp Ala Asn Trp 
50 55 60 

cgt tgg gtg cac aaa gtc ggc gac tac acc aac tgc tac acc ggc aac 240 
Arg Trp Val His Lys Val Gly Asp Tyr Thr Asn Cys Tyr Thr Gly Asn 
65 70 75 80 

acc tgg gac acg act ate tgc cct gac gat gcg acc tgc gca tec aac 288 
Thr Trp Asp Thr Thr He Cys Pro Asp Asp Ala Thr Cys Ala Ser Asn 
85 90 95 

tgc gcc ctt gag ggt gcc aac tac gaa tec acc tat ggt gtg acc gcc 336 
Cys Ala Leu Glu Gly Ala Asn Tyr Glu Ser Thr Tyr Gly Val Thr Ala 
100 105 no 

age ggc aat tec etc cgc etc aac ttc gtc acc acc age cag cag aag 384 
Ser Gly Asn Ser Leu Arg Leu Asn Phe Val Thr Thr Ser Gin Gin Lys 
115 120 125 

aac att ggc teg cgt ctg tac atg atg aag gac gac teg acc tac gag 432 
Asn He Gly Ser Arg Leu Tyr Met Met Lys Asp Asp Ser Thr Tyr Glu 
130 135 140 

atg ttt aag ctg ctg aac cag gag ttc acc ttc gat gtc gat gtc tec 480 
Met Phe Lys Leu Leu Asn Gin Glu Phe Thr Phe Asp Val Asp Val Ser 
145 150 155 * 160 

aac etc ccc tgc ggt etc aac ggt get ctg tac ttt gtc gcc atg gac 528 
Asn Leu Pro Cys Gly Leu Asn Gly Ala Leu Tyr Phe Val Ala Met Asp 
165 170 175 

gcc ggc ggt ggc atg tec aag tac cca acc aac aag gcc ggt gcc aag 576 
Ala Gly Gly Gly Met Ser Lys Tyr Pro Thr Asn Lys Ala Gly Ala Lys 
180 185 190 

tac ggt act gga tac tgt gac teg cag tgc cct cgc gac etc aag ttc 624 
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Tyr Gly Thr Gly Tyr Cys Asp Ser Gin Cys Pro Arg Asp Leu Lys Phe 
195 200 205 

ate aac ggt cag gec aac gtt gaa ggg tgg cag ccc tec tec aac gat 672 
He Asn Gly Gin Ala Asn Val Glu Gly Trp Gin Pro Ser Ser Asn Asp 
210 215 220 

gee aat gcg ggt ace ggc aac cac ggg tec tgc tgc gcg gag atg gat 720 
Ala Asn Ala Gly Thr Gly Asn His Gly Ser Cys Cys Ala Glu Met Asp 
225 230 235 240 

ate tgg gag gee aac age ate tec acg gee ttc acc ccc cat ccg tgc 768 
He Trp Glu Ala Asn Ser He Ser Thr Ala Phe Thr Pro His Pro Cys 
245 250 255 

gac acg ccc ggc cag gtg atg tgc acc ggt gat gec tgc ggt ggc acc 816 
Asp Thr Pro Gly Gin Val Met Cys Thr Gly Asp Ala Cys Gly Gly Thr 
260 265 270 

tac age tec gac cgc tac ggc ggc acc tgc gac ccc gac gga tgt gat 864 
Tyr Ser Ser Asp Arg Tyr Gly Gly Thr Cys Asp Pro Asp Gly Cys Asp 
275 280 285 

ttc aac tec ttc cgc cag ggc aac aag acc ttc tac ggc cct ggc atg 912 
Phe Asn Ser Phe Arg Gin Gly Asn Lys Thr Phe Tyr Gly Pro Gly Met 
290 295 300 

acc gtc gac acc aag age aag ttt acc gtc gtc acc cag ttc ate acc 960 
Thr Val Asp Thr Lys Ser Lys Phe Thr Val Val Thr Gin Phe He Thr 
305 310 315 320 

gac gac ggc acc tec age ggc acc etc aag gag ate aag cgc ttc tac 1008 
Asp Asp Gly Thr Ser Ser Gly Thr Leu Lys Glu He Lys Arg Phe Tyr 
325 330 335 

gtg cag aac ggc aag gtg ate ccc aac teg gag teg acc tgg acc ggc 1056 
Val Gin Asn Gly Lys Val He Pro Asn Ser Glu Ser Thr Trp Thr Gly 
340 345 350 

gtc age ggc aac tec ate acc acc gag tac tgc acc gee cag aag age 1104 
Val Ser Gly Asn Ser He Thr Thr Glu Tyr Cys Thr Ala Gin Lys Ser 
355 360 365 

ctg ttc cag gac cag aac gtc ttc gaa aag cac ggc ggc etc gag ggc 1152 
Leu Phe Gin Asp Gin Asn Val Phe Glu Lys His Gly Gly Leu Glu Gly 
370 375 380 

atg ggt get gec etc gee cag ggc atg gtt etc gtc atg tec ctg tgg 1200 
Met Gly Ala Ala Leu Ala Gin Gly Met Val Leu Val Met Ser Leu Trp 
385 390 395 400 

gat gat cac teg gee aac atg etc tgg etc gac age aac tac ccg acc 1248 
Asp Asp His Ser Ala Asn Met Leu Trp Leu Asp Ser Asn Tyr Pro Thr 
405 410 415 

act gee tct tec acc act ccc ggc gtc gee cgt ggt acc tgc gac ate 1296 
Thr Ala Ser Ser Thr Thr Pro Gly Val Ala Arg Gly Thr Cys Asp He 
420 425 430 

tec tec ggc gtc cct gcg gat gtc gag gcg aac cac ccc gac gec tac 1344 
Ser Ser Gly Val Pro Ala Asp Val Glu Ala Asn His Pro Asp Ala Tyr 
435 440 445 
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gtc gtc tac tec aac ate aag gtc ggc ccc ate ggc teg ace ttc aac 1392 
Val Val Tyr Ser Asn He Lys Val Gly Pro He Gly Ser Thr Phe Asn 
450 455 460 

a 9c ggt ggc teg aac ccc ggt ggc gga acc acc acg aca act ace acc 1440 
Ser Gly Gly Ser Asn Pro Gly Gly Gly Thr Thr Thr Thr Thr Thr Thr 
465 470 475 480 

cag cct act acc acc acg acc acg get gga aac cct ggc ggc acc gga 1488 
Gin Pro Thr Thr Thr Thr Thr Thr Ala Gly Asn Pro Gly Gly Thr Gly 
485 490 495 

gtc gca cag cac tat ggc cag tgt ggt gga ate gga tgg acc gga ccc 1536 
Val Ala Gin His Tyr Gly Gin Cys Gly Gly He Gly Trp Thr Gly Pro 
500 505 510 

aca acc tgt gec age cct tat acc tgc cag aag ctg aat gat tat tac 1584 
Thr Thr Cys Ala Ser Pro Tyr Thr Cys Gin Lys Leu Asn Asp Tyr Tyr 
515 520 525 

tct cag tgc ctg tag 1599 
Ser Gin Cys Leu 
530 



<210> 56 

<211> 532 

<212> PRT 

<213> Chaetomidium pingtungium 

<400> 56 



Met Leu Ala Ser Thr Phe Ser Tyr Arg Met Tyr Lys Thr Ala Leu He 
1 5 10 15 

Leu Ala Ala Leu Leu Gly Ser Gly Gin Ala Gin Gin Val Gly Thr Ser 
20 25 30 

Gin Ala Glu Val His Pro Ser Met Thr Trp Gin Ser Cys Thr Ala Gly 
35 40 45 

Gly Ser Cys Thr Thr Asn Asn Gly Lys Val Val He Asp Ala Asn Trp 
50 55 60 

Arg Trp Val His Lys Val Gly Asp Tyr Thr Asn Cys Tyr Thr Gly Asn 
65 70 75 80 

Thr Trp Asp Thr Thr He Cys Pro Asp Asp Ala Thr Cys Ala Ser Asn 
85 90 95 

Cys Ala Leu Glu Gly Ala Asn Tyr Glu Ser Thr Tyr Gly Val Thr Ala 
100 105 110 

Ser Gly Asn Ser Leu Arg Leu Asn Phe Val Thr Thr Ser Gin Gin Lys 
115 120 125 

Asn lie Gly Ser Arg Leu Tyr Met Met Lys Asp Asp Ser Thr Tyr Glu 
130 135 140 



Met Phe Lys Leu Leu Asn Gin Glu Phe Thr Phe Asp Val Asp Val Ser 
145 150 155 160 
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Asn Leu Pro Cys Gly Leu Asn Gly Ala Leu Tyr Phe Val Ala Met Asp 
165 170 175 

Ala Gly Gly Gly Met Ser Lys Tyr Pro Thr Asn Lys Ala Gly Ala Lys 
180 185 190 

Tyr Gly Thr Gly Tyr Cys Asp Ser Gin Cys Pro Arg Asp Leu Lys Phe 
195 200 205 

He Asn Gly Gin Ala Asn Val Glu Gly Trp Gin Pro Ser Ser Asn Asp 
210 215 220 

Ala Asn Ala Gly Thr Gly Asn His Gly Ser Cys Cys Ala Glu Met Asp 
225 230 235 240 

He Trp Glu Ala Asn Ser He Ser Thr Ala Phe Thr Pro His Pro Cys 
245 250 255 

Asp Thr Pro Gly Gin Val Met Cys Thr Gly Asp Ala Cys Gly Gly Thr 
260 265 270 

Tyr Ser Ser Asp Arg Tyr Gly Gly Thr Cys Asp Pro Asp Gly Cys Asp 
275 280 285 

Phe Asn Ser Phe Arg Gin Gly Asn Lys Thr Phe Tyr Gly Pro Gly Met 
290 295 300 

Thr Val Asp Thr Lys Ser Lys Phe Thr Val Val Thr Gin Phe He Thr 
305 310 315 320 

Asp Asp Gly Thr Ser Ser Gly Thr Leu Lys Glu He Lys Arg Phe Tyr 
325 330 335 

Val Gin Asn Gly Lys Val He Pro Asn Ser Glu Ser Thr Trp Thr Gly 
340 345 350 

Val Ser Gly Asn Ser He Thr Thr Glu Tyr Cys Thr Ala Gin Lys Ser 
355 360 365 

Leu Phe Gin Asp Gin Asn Val Phe Glu Lys His Gly Gly Leu Glu Gly 
370 375 . 380 

Met Gly Ala Ala Leu Ala Gin Gly Met Val Leu Val Met Ser Leu Trp 
385 390 395 400 

Asp Asp His Ser Ala Asn Met Leu Trp Leu Asp Ser Asn Tyr Pro Thr 
405 410 415 

Thr Ala Ser Ser Thr Thr Pro Gly Val Ala Arg Gly Thr Cys Asp He 
420 425 430 

Ser Ser Gly Val Pro Ala Asp Val Glu Ala Asn His Pro Asp Ala Tyr 
435 440 445 

val Val Tyr Ser Asn He. Lys Val Gly Pro He Gly Ser Thr Phe Asn 
450 455 460 

Ser Gly Gly Ser Asn Pro Gly Gly Gly Thr Thr Thr Thr Thr Thr Thr 
465 470 475 480 



Gin Pro Thr Thr Thr Thr Thr Thr Ala Gly Asn Pro Gly Gly Thr Gly 
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485 490 495 

Val Ala Gin His Tyr Gly Gin Cys Gly Gly He Gly Trp Thr Gly Pro 
500 505 * 510 

Thr Thr Cys Ala Ser Pro Tyr Thr Cys Gin Lys Leu Asn Asp Tyr Tyr 
515 520 525 

Ser Gin Cys Leu 
530 



<210> 57 

<211> 1383 

<212> DNA 

<213> Sporotrichum pruinosum 
<220> 

<221> CDS 

<222> (1)..<1383) 

<223> 

<400> 57 

atg ttc aag aaa gtc gcc etc acc get etc tgc ttc etc gee gtc gca 48 
Met Phe Lys Lys Val Ala Leu Thr Ala Leu Cys Phe Leu Ala Val Ala 
15 10 15 

cag gcc caa cag gtc ggt cgc gaa gtc get gaa aac cac ccc cgt etc 96 
Gin Ala Gin Gin Val Gly Arg Glu Val Ala Glu Asn His Pro Arg Leu 
20 25 30 

ccg tgg cag cgt tgc act cgc aac ggc gga tgc cag act gtc tct aac 144 
Pro Trp Gin Arg Cys Thr Arg Asn Gly Gly Cys Gin Thr Val Ser Asn 
35 40 45 

ggt cag gtc gtc etc gac gcc aac tgg cga tgg etc cac gtc acc gat 192 
Gly Gin Val Val Leu Asp Ala Asn Trp Arg Trp Leu His Val Thr Asp 
50 55 60 

ggc tac acc aac tgc tac acc ggt aac tec tgg aac age acc gtc tgc 240 
Gly Tyr Thr Asn Cys Tyr Thr Gly Asn Ser Trp Asn Ser Thr Val Cys 
65 70 75 80 

tec gac ccc acc acc tgc get cag cga tgc get etc gag ggt gcc aac 288 
Ser Asp Pro Thr Thr Cys Ala Gin Arg Cys Ala Leu Glu Gly Ala Asn 
85 90 95 

tac cag caa acc tac ggt ate acc acc aac gga gac gcc etc acc ate 336 
Tyr Gin Gin Thr Tyr Gly He Thr Thr Asn Gly Asp Ala Leu Thr He 
100 105 no 

aag ttc etc acc cga tec caa caa acc aac gtc ggt get cgt gtc tac 384 
Lys Phe Leu Thr Arg Ser Gin Gin Thr Asn Val Gly Ala Arg Val Tyr 
115 120 125 

etc atg gag aac gag aac cga tac cag atg ttc aac etc etc aac aag 432 
Leu Met Glu Asn Glu Asn Arg Tyr Gin Met Phe Asn Leu Leu Asn Lys 
130 135 140 

gag ttc acc ttc gac gtt gac gtc tec aag gtt cct tgc ggt ate aac 480 
Glu Phe Thr Phe Asp Val Asp Val Ser Lys Val Pro Cys Gly He Asn 
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ggt gcc etc tac ttc ate cag atg gac gec gat ggt ggt atg age aag 528 
Gly Ala Leu Tyr Phe lie Gin Met Asp Ala Asp Gly Gly Met Ser Lys 
165 170 175 

caa ccc aac aac agg get ggt get aag tac ggt ace ggc tac tgc gac 576 
Gin Pro Asn Asn Arg Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 
180 185 190 

tct cag tgc ccc cgt gac ate aag ttc att gac ggc gtg gcc aac age 624 
Ser Gin Cys Pro Arg Asp He Lys Phe He Asp Gly Val Ala Asn Ser 
195 200 205 

gcc gac tgg act cca tec gag ace gat ccc aat gcc gga agg ggt cgc 672 
Ala Asp Trp Thr Pro Ser Glu Thr Asp Pro Asn Ala Gly Arg Gly Arg 
210 215 220 

tac ggc att tgc tgc gcc gag atg gat ate tgg gag gcc aac tec ate 720 
Tyr Gly He Cys Cys Ala Glu Met Asp He Trp Glu Ala Asn Ser He 
225 230 235 240 

tec aat gcc tac acc ccc cac cct tgc cga ace cag aac gat ggt ggc 768 
Ser Asn Ala Tyr Thr Pro His Pro Cys Arg Thr Gin Asn Asp Gly Gly 
245 250 * 255 

tac cag cgc tgc gag ggc cgc gac tgc aac cag cct cgc tat gag ggt 816 
Tyr Gin Arg Cys Glu Gly Arg Asp Cys Asn Gin Pro Arg Tyr Glu Gly 
260 265 270 

ctt tgc gat cct gat ggc tgt gac tac aac ccc ttc cgc atg ggt aac 864 
Leu Cys Asp Pro Asp Gly Cys Asp Tyr Asn Pro Phe Arg Met Gly Asn 
275 280 285 

aag gac ttc tac gga ccc gga aag acc ate gac acc aac agg aag atg 912 
Lys Asp Phe Tyr Gly Pro Gly Lys Thr lie Asp Thr Asn Arg Lys Met 
290 295 300 

acc gtc gtc acc caa ttc ate acc cac gac aac acc gac act ggc acc 960 
Thr Val Val Thr Gin Phe He Thr His Asp Asn Thr Asp Thr Gly Thr 
305 310 315 320 

etc gtt gac ate cgc cgc etc tac gtt caa gac ggc cgt gtc att gcc 1008 
Leu Val Asp He Arg Arg Leu Tyr Val Gin Asp Gly Arg Val He Ala 
325 330 ~ 335 

aac cct ccc acc aac ttc ccc ggt etc atg ccc gcc cac gac tec ate 1056 
Asn Pro Pro Thr Asn Phe Pro Gly Leu Met Pro Ala His Asp Ser He 
340 345 350 

acc gag cag ttc tgc act gac cag aag aac etc ttc ggc gac tac age 1104 
Thr Glu Gin Phe Cys Thr Asp Gin Lys Asn Leu Phe Gly Asp Tyr Ser 
355 360 365 

age ttc get cgt gac ggt ggt etc get cac atg ggt cgc tec etc gcc 1152 
Ser Phe Ala Arg Asp Gly Gly Leu Ala His Met Gly Arg Ser Leu Ala 
370 375 380 

aag ggt cac gtc etc get etc tec ate tgg aac gac cac ggt gcc cac 1200 
Lys Gly His Val Leu Ala Leu Ser lie Trp Asn Asp His Gly Ala His 
385 390 395 " 400 
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atg ttg tgg etc gac tec aac tac ccc acc gac get gac ccc aac aag 1248 
Met Leu Trp Leu Asp Ser Asn Tyr Pro Thr Asp Ala Asp Pro Asn Lys 
405 410 415 

ccc ggt att get cgt ggt acc tgc ccg acc act ggt ggc acc ccc cgt 1296 
Pro Gly He Ala Arg Gly Thr Cys Pro Thr Thr Gly Gly Thr Pro Arg 
420 425 430 

gaa acc gaa caa aac cac cct gat gec cag gtc ate ttc tec aac att 1344 
Glu Thr Glu Gin Asn His Pro Asp Ala Gin Val He Phe Ser Asn He 
435 440 44S 

aaa ttc ggt gac ate ggc teg act ttc tct ggt tac taa 1383 
Lys Phe Gly Asp He Gly Ser Thr Phe Ser Gly Tyr 
450 455 460 



<210> 58 

<211> 460 

<212> PRT 

<213> Sporotrichum pruinosum 

<400> 58 

Met Phe Lys Lys Val Ala Leu Thr Ala Leu Cys Phe Leu Ala Val Ala 
1 5 10 15 

Gin Ala Gin Gin Val Gly Arg Glu Val Ala Glu Asn His Pro Arg Leu 
20 25 30 

Pro Trp Gin Arg Cys Thr Arg Asn Gly Gly Cys Gin Thr Val Ser Asn 
35 40 45 

Gly Gin Val Val Leu Asp Ala Asn Trp Arg Trp Leu His Val Thr Asp 
50 55 * 60 

Gly Tyr Thr Asn Cys Tyr Thr Gly Asn Ser Trp Asn Ser Thr Val Cys 
65 70 75 80 

Ser Asp Pro Thr Thr Cys Ala Gin Arg Cys Ala Leu Glu Gly Ala Asn 
85 90 95 

Tyr Gin Gin Thr Tyr Gly He Thr Thr Asn Gly Asp Ala Leu Thr He 
100 105 110 

Lys Phe Leu Thr Arg Ser Gin Gin Thr Asn Val Gly Ala Arg Val Tyr 
115 120 125 

Leu Met Glu Asn Glu Asn Arg Tyr Gin Met Phe Asn Leu Leu Asn Lys 
130 135 140 

Glu Phe Thr Phe Asp Val Asp Val Ser Lys Val Pro Cys Gly He Asn 
145 150 155 160 

Gly Ala Leu Tyr Phe He Gin Met Asp Ala Asp Gly Gly Met Ser Lys 
165 170 * 175 

Gin Pro Asn Asn Arg Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 
180 185 190 

Ser Gin Cys Pro Arg Asp He Lys Phe He Asp Gly Val Ala Asn Ser 
195 200 205 
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Ala Asp Trp Thr Pro Ser Glu Thr Asp Pro Asn Ala Gly Arg Gly Arg 
210 215 220 

Tyr Gly He Cys Cys Ala Glu Met Asp He Trp Glu Ala Asn Ser He 
225 230 235 240 

Ser Asn Ala Tyr Thr Pro His Pro Cys Arg Thr Gin Asn Asp Gly Gly 
245 250 255 

Tyr Gin Arg Cys Glu Gly Arg Asp Cys Asn Gin Pro Arg Tyr Glu Gly 
260 265 270 

Leu Cys Asp Pro Asp Gly Cys Asp Tyr Asn Pro Phe Arg Met Gly Asn 
275 280 285 

Lys Asp Phe Tyr Gly Pro Gly Lys Thr He Asp Thr Asn Arg Lys Met 
290 295 300 

Thr Val Val Thr Gin Phe He Thr His Asp Asn Thr Asp Thr Gly Thr 
305 310 31S ^ 320 

Leu Val Asp He Arg Arg Leu Tyr Val Gin Asp Gly Arg Val He Ala 
325 330 335 

Asn Pro Pro Thr Asn Phe Pro Gly Leu Met Pro Ala His Asp Ser He 
340 345 350 

Thr Glu Gin Phe Cys Thr Asp Gin Lys Asn Leu Phe Gly Asp Tyr Ser 
355 360 365 

Ser Phe Ala Arg Asp Gly Gly Leu Ala His Met Gly Arg Ser Leu Ala 
370 375 380 

Lys Gly His Val Leu Ala Leu Ser He Trp Asn Asp His Gly Ala His 
385 390 395 400 

Met Leu Trp Leu Asp Ser Asn Tyr Pro Thr Asp Ala Asp Pro Asn Lys 
405 410 ~ 415 

Pro Gly He Ala Arg Gly Thr Cys Pro Thr Thr Gly Gly Thr Pro Arg 
420 425 430 

Glu Thr Glu Gin Asn His Pro Asp Ala Gin Val He Phe Ser Asn He 
435 440 445 



Lys Phe Gly Asp He Gly Ser Thr Phe Ser Gly Tyr 
450 455 460 



<210> 59 

<211> 1578 

<212> DMA 

<213> Scytalidium thermophilum 
<220> 

<221> CDS 

<222> (1)..(1578) 

<223> 

<400> 59 
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atg cgt acc gcc aag ttc gcc acc etc gec gec ctt gtg gec teg gec 48 
Met Arg Thr Ala Lys Phe Ala Thr Leu Ala Ala Leu Vai Ala Ser Ala 
15 10 15 

gcc gcc cag cag gcg tgc agt etc acc acc gag agg cac cct tec etc 96 
Ala Ala Gin Gin Ala Cys Ser Leu Thr Thr Glu Arg His Pro Ser Leu 
20 25 30 

tct tgg aag aag tgc acc gcc ggc ggc cag tgc cag acc gtc cag get 144 
Ser Trp Lys Lys Cys Thr Ala Gly Gly Gin Cys Gin Thr Val Gin Ala 
35 40 45 

tec ate act etc gac tec aac tgg cgc tgg act cac cag gtg tct ggc 192 
Ser lie Thr Leu Asp Ser Asn Trp Arg Trp Thr His Gin Val Ser Gly 
50 55 60 

tec acc aac tgc tac acg ggc aac aag tgg gat act age ate tgc act 240 
Ser Thr Asn Cys Tyr Thr Gly Asn Lys Trp Asp Thr Ser lie Cys Thr 
65 70 75 80 

gat gcc aag teg tgc get cag aac tgc tgc gtc gat ggt gcc gac tac 288 
Asp Ala Lys Ser Cys Ala Gin Asn Cys Cys Val Asp Gly Ala Asp Tyr 
85 90 95 

acc age acc tat ggc ate acc acc aac ggt gat tec ctg age etc aag 336 
Thr Ser Thr Tyr Gly He Thr Thr Asn Gly Asp Ser Leu Ser Leu Lys 
100 105 110 

ttc gtc acc aag ggc cag cac teg acc aac gtc ggc teg cgt acc tac 384 
Phe Val Thr Lys Gly Gin His Ser Thr Asn Val Gly Ser Arg Thr Tyr 
115 120 125 

ctg atg gac ggc gag gac aag tat cag acc ttc gag etc etc ggc aac 432 
Leu Met Asp Gly Glu Asp Lys Tyr Gin Thr Phe Glu Leu Leu Gly Asn 
130 135 140 

gag ttc acc ttc gat gtc gat gtc tec aac ate ggc tgc ggt etc aac 480 
Glu Phe Thr Phe Asp Val Asp Val Ser Asn He Gly Cys Gly Leu Asn 
145 150 155 160 

ggc gcc ctg tac ttc gtc tec atg gac gcc gat ggt ggt etc age cgc 528 
Gly Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Leu Ser Arg 
165 170 175 

tat cct ggc aac aag get ggt gcc aag tac ggt acc ggc tac tgc gat 576 
Tyr Pro Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 
180 185 190 

get cag tgc ccc cgt gac ate aag ttc ate aac ggc gag gcc aac att 624 
Ala Gin Cys Pro Arg Asp He Lys Phe He Asn Gly Glu Ala Asn He 
195 200 205 

gag ggc tgg acc ggc tec acc aac gac ccc aac gcc ggc gcg ggc cgc 672 
Glu Gly Trp Thr Gly Ser Thr Asn Asp Pro Asn Ala Gly Ala Gly Arg 
210 215 220 

tat ggt acc tgc tgc tct gag atg gat ate tgg gaa gcc aac aac atg 720 
Tyr Gly Thr Cys Cys Ser Glu Met Asp He Trp Glu Ala Asn Asn Met 
225 230 235 240 

get act gcc ttc act cct cac cct tgc acc ate att ggc cag age cgc 768 
Ala Thr Ala Phe Thr Pro His Pro Cys Thr He He Gly Gin Ser Arg 
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tgc gag ggc gac teg tgc ggt ggc acc tac age aac gag cgc tac gec 816 
Cys Glu Gly Asp Ser Cys Gly Gly Thr Tyr Ser Asn Glu Arg Tyr Ala 
260 265 270 

ggc gtc tgc gac ccc gat ggc tgc gac ttc aac teg tac cgc cag ggc 864 
Gly Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Gin Gly 
275 280 285 

aat aag acc ttc tac ggc aag ggc atg acc gtc gac acc acc aag aag 912 
Asn Lys Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Thr Lys Lys 
290 295 300 

ate act gtc gtc acc cag ttc etc aag gat gee aac ggc gat etc ggc 960 
He Thr Val Val Thr Gin Phe Leu Lys Asp Ala Asn Gly Asp Leu Gly 
305 310 315 320 

gag gtc aag cgc ttc tac gtc cag gat ggc aag ate ate ccc aac tec 1008 
Glu Val Lys Arg Phe Tyr Val Gin Asp Gly Lys He He Pro Asn Ser 
325 330 335 

gag tec acc ate ccc ggc gtc gag ggc aat tec ate acc cag gac tgg 1056 
Glu Ser Thr lie Pro Gly Val Glu Gly Asn Ser He Thr Gin Asp Trp 
340 345 350 

tgc gac cgc cag aag gtt gec ttt ggc gac att gac gac ttc aac cgc 1104 
Cys Asp Arg Gin Lys Val Ala Phe Gly Asp He Asp Asp Phe Asn Arg 
355 360 " 365 

aag ggc ggc atg aag cag atg ggc aag gec etc gec ggc ccc atg gtc 1152 
Lys Gly Gly Met Lys Gin Met Gly Lys Ala Leu Ala Gly Pro Met Val 
370 375 380 

ctg gtc atg tec ate tgg gat gac cac gec tec aac atg etc tgg etc 1200 
Leu Val Met Ser lie Trp Asp Asp His Ala Ser Asn Met Leu Trp Leu 
385 390 395 400 

gac teg acc ttc cct gtc gat gec get ggc aag ccc ggc gee gag cgc 1248 
Asp Ser Thr Phe Pro Val Asp Ala Ala Gly Lys Pro Gly Ala Glu Arg 
405 410 415 

ggt gec tgc ccg acc acc teg ggt gtc cct get gag gtt gag gec gag 1296 
Gly Ala Cys Pro Thr Thr Ser Gly Val Pro Ala Glu Val Glu Ala Glu 
420 425 430 



gec ccc aac age aac gtc gtc ttc tec aac ate cgc ttc ggc ccc ate 1344 
Ala Pro Asn Ser Asn Val Val Phe Ser Asn He Arg Phe Gly Pro He 
435 440 445 

ggc teg acc gtt get ggt etc ccc ggc gcg ggc aac ggc ggc aac aac 1392 
Gly Ser Thr Val Ala Gly Leu Pro Gly Ala Gly Asn Gly Gly Asn Asn 
450 455 460 

ggc ggc aac ccc ccg ccc ccc acc acc acc acc tec teg get ccg gec 1440 
Gly Gly Asn Pro Pro Pro Pro Thr Thr Thr Thr Ser Ser Ala Pro Ala 
465 470 475 480 

acc acc acc acc gee age get ggc ccc aag get ggc cac tgg cag cag 1488 
Thr Thr Thr Thr Ala Ser Ala Gly Pro Lys Ala Gly His Trp Gin Gin 
485 490 ~ 495 
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tgc ggc ggc ate ggc ttc act ggc ccg acc cag tgc gag gag ccc tac 1536 
Cys Gly Gly He Gly Phe Thr Gly Pro Thr Gin Cys Glu Glu Pro Tyr 
500 505 * 510 

act tgc acc aag etc aac gac tgg tac tct cag tgc ctg taa 1578 
Thr Cys Thr Lys Leu Asn Asp Trp Tyr Ser Gin Cys Leu 
515 520 ' 525 



<210> 60 
<211> 525 
<212> PRT 

<213> Scytalidium thermophilum 
<400> 60 

Met Arg Thr Ala Lys Phe Ala Thr Leu Ala Ala Leu Val Ala Ser Ala 
15 10 is 

Ala Ala Gin Gin Ala Cys Ser Leu Thr Thr Glu Arg His Pro Ser Leu 
20 25 " 30 

Ser Trp Lys Lys Cys Thr Ala Gly Gly Gin Cys Gin Thr Val Gin Ala 
35 40 45 

Ser He Thr Leu Asp Ser Asn Trp Arg Trp Thr His Gin Val Ser Gly 
50 55 60 

Ser Thr Asn Cys Tyr Thr Gly Asn Lys Trp Asp Thr Ser He Cys Thr 
65 70 75 80 

Asp Ala Lys Ser Cys Ala Gin Asn Cys Cys Val Asp Gly Ala Asp Tyr 
85 90 95 

Thr Ser Thr Tyr Gly He Thr Thr Asn Gly Asp Ser Leu Ser Leu Lys 
100 105 110 

Phe Val Thr Lys Gly Gin His Ser Thr Asn Val Gly Ser Arg Thr Tyr 
115 120 125 

Leu Met Asp Gly Glu Asp Lys Tyr Gin Thr Phe Glu Leu Leu Gly Asn 
130 135 140 

Glu Phe Thr Phe Asp Val Asp Val Ser Asn He Gly Cys Gly Leu Asn 
145 150 155 " 160 

Gly Ala Leu Tyr Phe Val Ser Met Asp Ala Asp Gly Gly Leu Ser Arg 
165 170 ' ~ - 175 

Tyr Pro Gly Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp 
180 185 190 

Ala Gin Cys Pro Arg Asp lie Lys Phe He Asn Gly Glu Ala Asn He 
195 200 205 

Glu Gly Trp Thr Gly Ser Thr Asn Asp Pro Asn Ala Gly Ala Gly Arg 
210 215 220 

Tyr Gly Thr Cys Cys Ser Glu Met Asp He Trp Glu Ala Asn Asn Met 
225 230 235 240 

Ala Thr Ala Phe Thr Pro His Pro Cys Thr He He Gly Gin Ser Arg 
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245 



250 



255 



Cys Glu Gly Asp Ser Cys Gly Gly Thr Tyr Ser Asn Glu Arg Tyr Ala 
260 265 270 

Gly Val Cys Asp Pro Asp Gly Cys Asp Phe Asn Ser Tyr Arg Gin Gly 
275 280 285 

Asn Lys Thr Phe Tyr Gly Lys Gly Met Thr Val Asp Thr Thr Lys Lys 
290 295 300 

He Thr Val Val Thr Gin Phe Leu Lys Asp Ala Asn Gly Asp Leu Gly 
305 310 ~ 31S ' * 320 

Glu Val Lys Arg Phe Tyr Val Gin Asp Gly Lys He He Pro Asn Ser 
325 330 335 

Glu Ser Thr He Pro Gly Val Glu Gly Asn Ser He Thr Gin Asp Trp 
340 345 350 

Cys Asp Arg Gin Lys Val Ala Phe Gly Asp He Asp Asp Phe Asn Arg 
355 360 365 

Lys Gly Gly Met Lys Gin Met Gly Lys Ala Leu Ala Gly Pro Met Val 
370 375 380 

Leu Val Met Ser He Trp Asp Asp His Ala Ser Asn Met Leu Trp Leu 
385 390 395 400 

Asp Ser Thr Phe Pro Val Asp Ala Ala Gly Lys Pro Gly Ala Glu Arg 
405 410 415 

Gly Ala Cys Pro Thr Thr Ser Gly Val Pro Ala Glu Val Glu Ala Glu 
420 425 430 

Ala Pro Asn Ser Asn Val Val Phe Ser Asn He Arg Phe Gly Pro He 
435 440 445 

Gly Ser Thr Val Ala Gly Leu Pro Gly Ala Gly Asn Gly Gly Asn Asn 
450 455 460 

Gly Gly Asn Pro Pro Pro Pro Thr Thr Thr Thr Ser Ser Ala Pro Ala 
465 470 475 480 

Thr Thr Thr Thr Ala Ser Ala Gly Pro Lys Ala Gly His Trp Gin Gin 
485 490 495 

Cys Gly Gly He Gly Phe Thr Gly Pro Thr Gin Cys Glu Glu Pro Tyr 
500 505 510 

Thr Cys Thr Lys Leu Asn Asp Trp Tyr Ser Gin Cys Leu 



<210> 61 

<211> 519 

<212> DNA 

<213> Aspergillus sp. 
<220> 

<221> misc feature 



515 



520 



525 
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<222> (1)..(519) 

<223> Partial CBH1 encoding sequence 
<400> 61 

gagatggaca tatgggaggc caacagcatc tccacggcct tcacgcccca cccctgcgat 60 

gtccccggcc aggtgatgtg cgagggcgac tcctgcggtg gcacctacag cagcgaccgc 120 

tatggcggca cctgcgatcc cgatggatgt gacttcaact cctaccgcca gggcaacaag 160 

tccttctacg gccccggcat gaccgtcgac accaacagca aggtcaccgt cgtgactcag 240 

ttcctcaccg acgacggcac tgccaccggc accctgtcgg agatcaagcg gttctacgtg 300 

cagaacggca aggtcatccc caactccgag tcgacctggc ccggcgtcgg cggcaactcc 360 

atcaccaccg actactgtct ggcccagaag agcctcttcg gcgataccga cgtcttcacc 420 

aagcacggcg gtatggaggg catgggcgcc gccctcgccg agggcatggt cctcgtcctg 480 

agtctctggg acgaccacca ctccaacatg ctctggctg 519 

<210> 62 

<211> 497 

<212> DNA 

<213> Scopulariopsis sp. 
<220> 

<221> misc_feature 

<222> (1) .7(497) 

<223> Partial CBH1 encoding sequence 

<400> 62 



gagatcgatg 


tgtgggagtc 


gaacgcctat gccttcgttt tcacgccgca cgcgtgcacg 


60 


accaacgagt 


accacgtctg 


cgagaccacc aactgcggtg gcacctactc ggaggaccgc 


120 


ttcaccggca 


agtgcgacgc 


caacggctgc gactacaacc cctaccgcat gggcaacccc 


180 


gacttctacg 


gcaagggcaa 


gacgctcgac accagccgca agttcaccgt cgtctcccgc 


240 


ttcgaggaga 


acaagctctc 


ccagtacttc atccaggacg gccgcaagat cgagatcccg 


300 


ccgccgacgt 


gggagggcat 


gcccaacagc agcgagatca cccccgagct ctgctccacc 


360 


atgttcgatg 


tgctcgacga 


ccgcaaccgc ttgcaggagg tcggcggctt cgagcagctg 


420 


aacaacgccc 


tccgggttcc 


catggtcctc gtcatgtcca tctgggacga ccactacgcc 


480 


aacatgctct 


ggctcga 




497 



<210> 63 

<211> 498 

<212> DNA 

<213> Fusarium sp. 

<220> 

<221> misc_feature 
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<222> (1)..(498) 

<223> Partial CBHl encoding sequence 
<400> 63 



gagatggata 


tctgggaggc caacaagatc 


tccactgcct acactcccca cccctgcaag 


60 


agcctcaccc 


agcagtcctg cgagggcgat 


gcctgcggtg gcacctactc tactacccgc 


120 


tatgctggaa 


cttQcaaccc caatacrttac 




1 OA 

lou 


accttctacg 


gccccggctc cggcttcaac 


gttgatacca ccaagaaggt gactgtcgtg 


240 


acccagttca 


tcaagggcag cgacggcaag 


ctttccgaga tcaagcgtct ctatgttcag 


300 


aatggcaagg 


tcattggcaa cccccagtct 


gagattgcca gcaaccctgg cagcagcgtc 


360 


accgacagct 


tctgcaaggc ccagaaggtt 


gccttcaacg accccgatga cttcaacaag 


420 


aagggtggct 


ggagcggaat gagcgacgcc 


ctcgccaagc ccatggttct cgtcatgagc 


480 


ttgtggcacg 


acgtgagt 
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<210> 64 
<211> 525 
<212> DNA 

<213> Verticillium sp. 
<220> 

<221> misc_feature 
<222> (1)..(525) 

<223> Partial CBHl encoding sequence 
<400> 64 

gagatggata tctgggaggc caacaagatc tccacggcct acactcccca tccctgcaag 60 
agcctcaccc agcagtcctg tgagggcgat gcctgcggtg gcacctactc ttccacccgc 120 
tatgctggaa cttgcgatcc cgatggctgc gatttcaacc cttaccgcca gggcaaccac 180 
accttctacg gtcccggctc cggcttcaac gtcgatacca ccaagaaggt gactgtcgtg 240 
acccagttca tcaagggcag cgacggcaag ctttccgaga tcaagcgtct ctatgttcag 300 
aatggcaagg tcatcggcaa cccccagtcc gagattgcaa acaaccccgg cagctccgtc 360 
accgacagct tctgcaaggc ccagaaggtt gccttcaacg accccgatga cttcaacaag 420 
aagggtggct ggagcggcat gaacgacgcc ctcgccaagc ccatggttct cgtcatgagc 480 
ctgtggcacg acgtgagtaa tctaacccct gagtctcgga caaga 525 

<210> 65 

<211> 1371 

<212> DNA 

<213> Pseudoplectania nigrella 
<220> 

<221> CDS 
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<222> (1)..(1371) 
<223> 

<400> 65 

atg eta tec aat etc ctt etc tea etc tct ttc ctt tec eta gee tec 48 

Met Leu Ser Asn Leu Leu Leu Ser Leu Ser Phe Leu Ser Leu Ala Ser 
15 10 15 

ggg caa aac ate ggt acc aac acc gec gaa age cac ccc caa ctt cgt 96 
Gly Gin Asn He Gly Thr Asn Thr Ala Glu Ser His Pro Gin Leu Arg 
20 25 30 

tct caa acc tgc acc aaa ggc aac gga tgc age acc caa tec acc tec 144 
Ser Gin Thr Cys Thr Lys Gly Asn Gly Cys Ser Thr Gin Ser Thr Ser 
35 40 45 

gta gtc ctg gac tec aac tgg cgc tgg ctg cac aat aat gga ggt tea 192 
val val Leu Asp Ser Asn Trp Arg Trp Leu His Asn Asn Gly Gly Ser 
50 55 60 

acg aac tgc tac acc ggc aat tec tgg gac tct aca tta tgt ccc gac 240 
Thr Asn Cys Tyr Thr Gly Asn Ser Trp Asp Ser Thr Leu Cys Pro Asp 
65 70 75 80 

cca gtt acc tgc gec aag aac tgt get etc gac ggt gee gac tat tct 288 
Pro Val Thr Cys Ala Lys Asn Cys Ala Leu Asp Gly Ala Asp Tyr Ser 
85 90 95 

ggg aca tac gga ate acc tct acg gga gat get ttg acg ttg aag ttt 336 
Gly Thr Tyr Gly He Thr Ser Thr Gly Asp Ala Leu Thr Leu Lys Phe 
100 105 110 

gtt act cag ggt cct tat teg act aat att gga tct egg gta tac eta 384 
Val Thr Gin Gly Pro Tyr Ser Thr Asn He Gly Ser Arg Val Tyr Leu 
115 120 125 

atg gcg agt gat act cag tat aag atg ttc cag etc aag aac aag gag 432 
Met Ala Ser Asp Thr Gin Tyr Lys Met Phe Gin Leu Lys Asn Lys Glu 
130 135 140 

ttt acg ttt gat gtt gat gtc tct aat ctt cct tgt gga tta aac gga 480 
Phe Thr Phe Asp Val Asp Val Ser Asn Leu Pro Cys Gly Leu Asn Gly 
145 150 155 * 160 

gcg ttg tat ttt gtg gag atg gat gcg gat gga gga atg teg aaa tac 528 
Ala Leu Tyr Phe Val Glu Met Asp Ala Asp Gly Gly Met Ser Lys Tyr 
165 170 175 

ccg tct aat aaa gee ggg gca aaa tat gga acc ggg tat tgt gat gcg 576 
Pro Ser Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ala 
180 185 * " * 190 

cag tgt cca cat gat ate aaa ttt ate aac ggg gag gca aat etc eta 624 
Gin Cys Pro His Asp He Lys Phe He Asn Gly Glu Ala Asn Leu Leu 
195 200 205 

gac tgg acg cct tea acc age gac aaa aat gec ggc tec gga cgt tac 672 
Asp Trp Thr Pro Ser Thr Ser Asp Lys Asn Ala Gly Ser Gly Arg Tyr 
210 215 220 

ggg acc tgt tgt caa gaa atg gac ate tgg gaa gec aac age atg gca 720 
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Gly Thr Cys Cys Gin Glu Met Asp He Trp Glu Ala Asn Ser Met Ala 
225 230 235 240 

acc gcc tat aca ccg cat ccc tgt agt gtc tea gga cct acc cga tgc 768 
Thr Ala Tyr Thr Pro His Pro Cys Ser Val Ser Gly Pro Thr Arg Cys 
245 250 2S5 

tea gga acc caa tgt ggg gat ggt tct aac cgt cat aac gga att tgc 816 
Ser Gly Thr Gin Cys Gly Asp Gly Ser Asn Arg His Asn Gly He Cys 
260 265 270 

gat aaa gat ggc tgc gat ttc aat tec tac cgt atg ggc aat acg aca 864 
Asp Lys Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asn Thr Thr 
275 280 285 

ttc ttc ggc aag gga gca acg gtt aac acc aac tec aaa ttt act gtt 912 
Phe Phe Gly Lys Gly Ala Thr Val Asn Thr Asn Ser Lys Phe Thr Val 
290 295 300 

gta acg caa ttc ate acc tec gac aac acc tea act gga gcg eta aag 960 
Val Thr Gin Phe lie Thr Ser Asp Asn Thr Ser Thr Gly Ala Leu Lys 
305 310 315 320 

gag att cgt cgt ctt tat att cag aat gga aaa gtc ate cag aac teg 1008 
Glu He Arg Arg Leu Tyr He Gin Asn Gly Lys Val He Gin Asn Ser 
325 330 335 

aaa agt aat ate tec ggc atg tea get tac gac tct ata acc gag gat 1056 
Lys Ser Asn He Ser Gly Met Ser Ala Tyr Asp Ser He Thr Glu Asp 
340 345 350 

ttc tgt gcc get caa aaa acc gca ttt gga gac aca aat gac ttt aag 1104 
Phe Cys Ala Ala Gin Lys Thr Ala Phe Gly Asp Thr Asn Asp Phe Lys 
355 360 365 

gca aag ggc gga ttt aca aac ctt ggg aat gcg ttg caa aag gga atg 1152 
Ala Lys Gly Gly Phe Thr Asn Leu Gly Asn Ala Leu Gin Lys Gly Met 
370 375 380 

gtt ttg gcg ttg agt att tgg gat gat cat get gcg cag atg ctt tgg 1200 
Val Leu Ala Leu Ser He Trp Asp Asp His Ala Ala Gin Met Leu Trp 
385 390 395 400 

ttg gat agt tct tac ccg etc gat aaa gac cct tct caa cca ggt gtt 1248 
Leu Asp Ser Ser Tyr Pro Leu Asp Lys Asp Pro Ser Gin Pro Gly Val 
405 410 415 

aag agg ggc gcg tgt get acc tct tct ggt aaa ccg teg gat gtc gag 1296 
Lys Arg Gly Ala Cys Ala Thr Ser Ser Gly Lys Pro Ser Asp Val Glu 
420 425 430 

aac cag tct ccg aat gcg teg gtg act ttt teg aac att aag ttt ggg 1344 
Asn Gin Ser Pro Asn Ala Ser Val Thr Phe Ser Asn He Lys Phe Gly 
435 440 445 

gat att gga teg act tat tec tct tag 1371 
Asp He Gly Ser Thr Tyr Ser Ser 
450 455 



<210> 66 
<211> 456 
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<212> PRT 

<213> Pseudoplectania nigrella 
<400> 66 

Met Leu Ser Asn Leu Leu Leu Ser Leu Ser Phe Leu Ser Leu Ala Ser 
1 5 10 15 

Gly Gin Asn He Gly Thr Asn Thr Ala Glu Ser His Pro Gin Leu Arg 
20 25 30 

Ser Gin Thr Cys Thr Lys Gly Asn Gly Cys Ser Thr Gin Ser Thr Ser 
35 40 45 

Val Val Leu Asp Ser Asn Trp Arg Trp Leu His Asn Asn Gly Gly Ser 
50 55 60 

Thr Asn Cys Tyr Thr Gly Asn Ser Trp Asp Ser Thr Leu Cys Pro Asp 
65 70 75 80 

Pro Val Thr Cys Ala Lys Asn Cys Ala Leu Asp Gly Ala Asp Tyr Ser 
85 90 95 

Gly Thr Tyr Gly He Thr Ser Thr Gly Asp Ala Leu Thr Leu Lys Phe 
100 105 110 

Val Thr Gin Gly Pro Tyr Ser Thr Asn He Gly Ser Arg Val Tyr Leu 
115 120 125 

Met Ala Ser Asp Thr Gin Tyr Lys Met Phe Gin Leu Lys Asn Lys Glu 
130 135 140 

Phe Thr Phe Asp Val Asp Val Ser Asn Leu Pro Cys Gly Leu Asn Gly 
145 150 155 * * 160 

Ala Leu Tyr Phe Val Glu Met Asp Ala Asp Gly Gly Met Ser Lys Tyr 
165 170 * 175 

Pro Ser Asn Lys Ala Gly Ala Lys Tyr Gly Thr Gly Tyr Cys Asp Ala 
180 185 " 190 

Gin Cys Pro His Asp He Lys Phe He Asn Gly Glu Ala Asn Leu Leu 
195 200 205 

Asp Trp Thr Pro Ser Thr Ser Asp Lys Asn Ala Gly Ser Gly Arg Tyr 
210 215 220 

Gly Thr Cys Cys Gin Glu Met Asp He Trp Glu Ala Asn Ser Met Ala 
225 230 235 240 

Thr Ala Tyr Thr Pro His Pro Cys Ser Val Ser Gly Pro Thr Arg Cys 
245 250 255 

Ser Gly Thr Gin Cys Gly Asp Gly Ser Asn Arg His Asn Gly He Cys 
260 265 270 

Asp Lys Asp Gly Cys Asp Phe Asn Ser Tyr Arg Met Gly Asn Thr Thr 
275 280 285 

Phe Phe Gly Lys Gly Ala Thr Val Asn Thr Asn Ser Lys Phe Thr Val 
290 295 300 
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Val Thr Gin Phe He Thr Ser Asp Asn Thr Ser Thr Gly Ala Leu Lys 
305 310 315 320 

Glu He Arg Arg Leu Tyr He Gin Asn Gly Lys Val He Gin Asn Ser 
325 330 335 

Lys Ser Asn lie Ser Gly Met Ser Ala Tyr Asp Ser He Thr Glu Asp 
340 345 350 

Phe Cys Ala Ala Gin Lys Thr Ala Phe Gly Asp Thr Asn Asp Phe Lys 
355 360 365 

Ala Lys Gly Gly Phe Thr Asn Leu Gly Asn Ala Leu Gin Lys Gly Met 
370 375 380 

Val Leu Ala Leu Ser He Trp Asp Asp His Ala Ala Gin Met Leu Trp 
385 390 395 400 

Leu Asp Ser Ser Tyr Pro Leu Asp Lys Asp Pro Ser Gin Pro Gly Val 
405 410 415 

Lys Arg Gly Ala Cys Ala Thr Ser Ser Gly Lys Pro Ser Asp Val Glu 
420 425 430 

Asn Gin Ser Pro Asn Ala Ser Val Thr Phe Ser Asn He Lys Phe Gly 
435 440 445 

Asp He Gly Ser Thr Tyr Ser Ser 
450 455 

<210> 67 
<211> 951 
<212> DMA 

<213> Phytophthora infestans 
<220> 

<221> misc_feature 
<222> (1)..(951) 

<223> Partial CBHl encoding sequence 
<400> 67 

tgcgatgctg atggttgtga cttcaactct taccgccagg gtaacacctc tttctatggt 60 
gcaggtctta ccgtgaacac caacaaagtt ttcaccgttg taacccaatt catcaccaac 120 
gatggaacag cttcaggtac cttgaaagaa atccgacgat tctatgttca gaatggcgtc 180 
gtgattccaa actcgcaatc cacaatcgct ggagttccag gaaattccat caccgactct 240 
ttctgtgccg cacaaaagac tgcttttggt gacaccaacg aattcgctac taagggaggt 300 
cttgccacaa tgagcaaagc tttggcaaag ggtatggtac ttgtcatgtc catttgggat 360 
gaccataccg ccaacatgtt gtggctcgat gccccttacc cagcaaccaa atccccaagc 420 
gccccaggtg tcactcgagg atcatgcagt gctacttcag gtaaccccgt tgatgttgaa 480 
gccaattctc caggttcttc cgtcaccttc tcaaacatca agtggggtcc catcaactct 540 
acctacactg gatctggagc cgccccaagt gttccaggca ctacaaccgt tagctcggca 600 
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cccgcatcga ctgcaacttc aggagctggt ggtgtcgcta agtatgccca atgtggaggt 660 

actggataca gtggagctac cgcttgcgtt tcaggcagca cctgtgttgc cctcaaccct 720 

tactactccc aatgccaata gattgtttcc ctcaggagca attaggtttc caacctaagg 780 

ggagagatct tcacaagtct gtacataggg tcagctaaat gttgatcatt catattcttt 840 

catgtattta gttgttgaca atttgaagtt gcaagtcaag acgggaaaac agaagcagga 900 

aatatatggg acataacaaa gtcaatcgtt tacataagaa ccttctttaa a 951 
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